serpapi / public-roadmap

Public Roadmap for SerpApi, LLC (https://serpapi.com)
51 stars 4 forks source link

[Google Maps API] Scrape coordinates of polygon vertices #186

Open ilyazub opened 2 years ago

ilyazub commented 2 years ago

One user has asked to support the extraction of LAT/LONG pairs of each of the red Google Maps polygons corner points.

Intercom conversation ID: 102910000025986

Example Playground URL.

image

ilyazub commented 1 year ago

There are three ways to get coordinates of polygon vertices: OpenStreetMap API, extract data from Google Maps tile images, extract data from Google Maps protobuf messages.

OpenStreetMap provides GeoJSON data

curl -s 'https://nominatim.openstreetmap.org/search.php?q=Central+Park+USA&polygon_geojson=1&format=json&limit=5' | jq -cr '.[].geojson | select(.type == "Polygon")'

For example, here's what OSM provides for the Central Park in USA.

{"type":"Polygon","coordinates":[[[-73.9814075,40.7684558],[-73.9813532,40.7684005],[-73.9812954,40.7683388],[-73.9812666,40.7682979],[-73.9812424,40.7682578],[-73.981225,40.7682236],[-73.9812054,40.7681647],[-73.9811983,40.7681234],[-73.9811958,40.7680759],[-73.9811947,40.7680529],[-73.9811958,40.7680234],[-73.9812024,40.7679925],[-73.9812166,40.7679435],[-73.9812094,40.7679176],[-73.9811958,40.7678954],[-73.9811787,40.7678877],[-73.9809915,40.7678561],[-73.9736176,40.7647452],[-73.9732963,40.7651947],[-73.9731694,40.7652316],[-73.9730317,40.7652192],[-73.9726515,40.7650644],[-73.9725696,40.7651788],[-73.9724768,40.765304],[-73.9722517,40.7656121],[-73.9715326,40.7665973],[-73.970953,40.7673907],[-73.9708816,40.7674883],[-73.9704918,40.7680224],[-73.9704097,40.768135],[-73.9699764,40.7686749],[-73.9699119,40.7687628],[-73.968529,40.770653],[-73.9662265,40.7738129],[-73.9656177,40.7746546],[-73.9640526,40.776801],[-73.9637321,40.777239],[-73.9635111,40.7775416],[-73.963494,40.7775657],[-73.9634737,40.7775944],[-73.9634089,40.7776896],[-73.9633886,40.7777161],[-73.963339,40.7776932],[-73.9633312,40.7776914],[-73.9633258,40.7776925],[-73.9633211,40.7776951],[-73.9631581,40.7779237],[-73.9627898,40.7784331],[-73.9627195,40.7785304],[-73.9623347,40.7790627],[-73.9622595,40.7791667],[-73.9618727,40.7797018],[-73.9617979,40.7798051],[-73.961464,40.7802617],[-73.9615543,40.7803339],[-73.961481,40.7804146],[-73.9611242,40.7808832],[-73.9610479,40.78096],[-73.9605506,40.781666],[-73.9588854,40.7839459],[-73.9586307,40.7843102],[-73.9575733,40.7857067],[-73.9558772,40.788053],[-73.9554172,40.7886997],[-73.9552971,40.7888635],[-73.9535256,40.7912074],[-73.9531393,40.7917411],[-73.9530523,40.7918681],[-73.9522388,40.7929767],[-73.9518348,40.7935351],[-73.9516863,40.7937493],[-73.9512675,40.7943243],[-73.9508105,40.7949343],[-73.9507494,40.7950221],[-73.9503032,40.7956392],[-73.9499364,40.7961426],[-73.9498487,40.7962602],[-73.9496631,40.7965179],[-73.9496061,40.7965935],[-73.9496116,40.796606],[-73.9496189,40.7966161],[-73.9496299,40.7966288],[-73.9496411,40.7966385],[-73.9496618,40.7966473],[-73.9496802,40.7966549],[-73.9497027,40.7966668],[-73.9497244,40.7966815],[-73.9497422,40.7966959],[-73.9497575,40.7967123],[-73.9497688,40.7967264],[-73.9498014,40.7967913],[-73.9498019,40.7968361],[-73.9497938,40.7968656],[-73.9497839,40.7968924],[-73.9497731,40.7969178],[-73.9497626,40.7969378],[-73.9497591,40.7969467],[-73.9497571,40.7969586],[-73.9497623,40.796969],[-73.9497673,40.7969775],[-73.9497883,40.7969865],[-73.9502608,40.7971833],[-73.9509739,40.7974831],[-73.9510383,40.79751],[-73.952353,40.7980661],[-73.9527043,40.7982164],[-73.955126,40.799237],[-73.9555925,40.7994309],[-73.9559873,40.7995974],[-73.9576536,40.8002923],[-73.9576827,40.8003048],[-73.9577049,40.8003107],[-73.9577265,40.8003135],[-73.9577424,40.8003117],[-73.9577602,40.8003077],[-73.9578121,40.8002816],[-73.9578357,40.8002711],[-73.9578609,40.8002612],[-73.9578853,40.8002523],[-73.9579135,40.8002443],[-73.9579531,40.8002352],[-73.958039,40.8002111],[-73.9580742,40.8002065],[-73.9581067,40.8002035],[-73.958139,40.8001996],[-73.9581702,40.8001953],[-73.9581901,40.8001932],[-73.9582004,40.8001916],[-73.9582116,40.8001889],[-73.9582225,40.8001852],[-73.9582322,40.8001812],[-73.9582454,40.8001744],[-73.9582594,40.8001649],[-73.9584444,40.7999135],[-73.9587263,40.7995306],[-73.9588184,40.7994129],[-73.9589165,40.7992931],[-73.9589604,40.7992378],[-73.9589862,40.7992085],[-73.9590607,40.7991133],[-73.959776,40.7981253],[-73.9599617,40.7978697],[-73.9613456,40.7959607],[-73.9613682,40.795929],[-73.9626287,40.7942007],[-73.9639878,40.7923871],[-73.9645157,40.7916521],[-73.9646045,40.7915356],[-73.965651,40.7900902],[-73.9661212,40.7894415],[-73.9672728,40.7878736],[-73.9673372,40.7877836],[-73.9682398,40.7865399],[-73.9690437,40.7854255],[-73.969113,40.7853303],[-73.9692206,40.7851954],[-73.9692564,40.7851403],[-73.969659,40.784584],[-73.9697076,40.7845135],[-73.9700242,40.7840726],[-73.9702619,40.783752],[-73.9704966,40.7834445],[-73.9714695,40.7821005],[-73.9714884,40.7820679],[-73.9716566,40.781843],[-73.9725365,40.7806502],[-73.9726547,40.7804911],[-73.973142,40.7798279],[-73.9734015,40.7794669],[-73.9734706,40.7793708],[-73.973712,40.7790432],[-73.9741143,40.7784888],[-73.9748348,40.7774861],[-73.9757406,40.7762343],[-73.9758687,40.7760769],[-73.9762831,40.7755038],[-73.9771742,40.7742872],[-73.9771962,40.7742602],[-73.9780733,40.7730567],[-73.9781445,40.7729298],[-73.9786494,40.772256],[-73.9789928,40.7717669],[-73.9790604,40.7716807],[-73.9792112,40.7714634],[-73.9797693,40.7707071],[-73.9797918,40.7706802],[-73.9803189,40.7699787],[-73.9808308,40.7692432],[-73.9808882,40.7691598],[-73.9809437,40.7690783],[-73.9812638,40.7686118],[-73.9812971,40.7685782],[-73.9813785,40.7684879],[-73.9814003,40.7684638],[-73.9814075,40.7684558]]]}

But OSM doesn't have provide results for all of the places compared to Google Maps.

Example: Gardens of Westridge, Fort Worth, TX 76116, USA.

{"type":"Polygon","coordinates":[[[30.9959051,-29.8646281],[30.9965977,-29.8652568],[30.9971841,-29.8647711],[30.9964915,-29.8641423],[30.9959051,-29.8646281]]]}

Compared to Google Maps.

image

Extract coordinates from Google Maps tile images

Non-practical but very fun approach. OpenStreetMaps API should be enough for most cases. I haven't compared the number of places that OSM API supports comparing to scraping Google Maps.

The pre-requisite is to reverse engineer how maps tile URLs are made

https://www.google.com/maps/vt/pb=!1m4!1m3!1i17!2i30067!3i52920!2m3!1e0!2sm!3i629367778!2m36!1e2!2sspotlight!8m33!1m2!12m1!20e1!2m7!1s0x864e72c2bbdc7945%3A0x5ddf4f6ce59790b4!2sGardens%20of%20Westridge%2C%20Fort%20Worth%2C%20TX%2076116%2C%20USA!4m2!3d32.7123116!4d-97.4173501!5e1!6b1!11e1!13m11!2sa!18m5!6b0!9b1!20b1!21b1!22b0!22m3!6e2!7e3!8e2!19u12!19u14!19u29!19u37!19u30!19u61!19u70!20m1!1e6!3m8!2sen!3sua!5e1105!12m4!1e68!2m2!1sset!2sRoadmap!4e0!5m1!1e0!23i10203575!23i1381033!23i1368782!23i1368785!23i47025228!23i4592408!23i4640515!23i1375050!23i4536287

A guess algorithm to extract multi-polygon coordinates from Google Maps tiles

  1. Make a request to Google Maps for the specific place either manually or using SerpApi.
  2. Extract its latitude, longitude, and altitude.
  3. Fetch tile images for the specific region. Tile size is 256x256 pixels. Google Maps requests 20 tile images on the rendered map on desktop.
  4. For each pixel on each 256x256 tile, detect red-colored place boundary pixels by checking RGB values of a pixel in a tile image.
  5. Calculate coordinates of the place boundary pixels based on latitude, longitude, and altitude of the center of the screen that were extracted previously.
  6. Reduce polygon (e.g., https://github.com/mourner/simplify-js).\

Extract coordinates from Google Maps protobuf

Haven't researched it. URLs are like

https://www.google.com/maps/vt/stream/pb=!1m7!8m6!1m3!1i14!2i3757!3i6617!2i4!3x65535!2m3!1e0!2sm!3i629367898!2m68!1e2!2sspotlight!8m65!1m33!1m2!12m1!20e1!2m7!1s0x864e0c97e4e8944f%3A0xf6a43d9d113cdfe4!2sWhite+Settlement%2C+TX%2C+USA!4m2!3d32.7597739!4d-97.45909859999999!5e1!6b1!11e1!13m11!2sa!18m5!6b0!9b1!20b1!21b1!22b0!22m3!6e2!7e3!8e2!19u12!19u14!19u29!19u37!19u30!19u61!19u70!20m1!1e6!2m7!1s0x864e6da5566356ed%3A0xb6dc013b42a1a2ce!2sWedgwood+South!4m2!3d32.646417035219464!4d-97.38272130489348!5e1!6b1!11e1!13m11!2sa!18m5!6b0!9b1!20b1!21b1!22b0!22m3!6e2!7e3!8e2!19u12!19u14!19u29!19u37!19u30!19u61!19u70!20m2!1e6!2m0!3m8!2sen!3sua!5e1105!12m4!1e68!2m2!1sset!2sRoadmap!4e1!5m4!1e4!8m2!1e0!1e1!6m13!1e12!2i2!11e2!19m1!1e0!30m1!1f1.25!39b1!44e1!50e0!67m1!1e1!71b1!23i10203575!23i1381033!23i1368782!23i1368785!23i47025228!23i4592408!23i4640515!23i1375050!23i4536287!28i629&authuser=0

Content type is application/vnd.google.octet-stream-compressible; charset=x-user-defined. It seems to be a Google Protobuf format: https://github.com/google/protorpc/commit/eb03145a6a7c72ae6cc43867d9635a5b8d8c4545#diff-9409428171f083cd5b68e3037d0f92756b895617effda1a591609e609b4c6f2eR49. Contents probably can be viewed in Charles Proxy. Some

Some people are extracting polygon coordinates from Google Maps JS API or using built-in method. It's possible to de-serialize data from strings: https://stackoverflow.com/a/1426104/1291371.