mapillary / mapillary-python-sdk

A Python 3 library built on the Mapillary API v4 to facilitate retrieving and working with Mapillary data.
MIT License
39 stars 15 forks source link

[Requirements] 9. Get All Images In A Shape #20

Closed Rubix982 closed 3 years ago

Rubix982 commented 3 years ago

Is your feature request related to a problem? Please describe. This issue deals with the 2nd requirement from the PRD extracting all images within a shape

Describe the solution you'd like The base requirements are,

  1. Takes a GeoJSON object as argument
  2. Same as R07 for other

Describe alternatives you've considered NA

Additional context NA

Rubix982 commented 3 years ago

@cbeddow What should this return?

cbeddow commented 3 years ago

@Rubix982

this should return a feature collection, geojson.

1) the input geojson or bbox becomes the query extent. 2) we find all tile coordinates (x,y) at zoom 14 that intersect the geojson polygon, or the bbox around it 3) we query the images tile endpoint ("image" layer of the response when querying a tile) 4) we decode each tile into geojson with vt2geojson library 5) check the decoded tile data, which is in geojson format, to eliminate from the list of features anything that falls outside the input geojson polygon based on geographic coordinates in the geometry 6) we merge the geojson of each tile into a single geojson (by merging all the features into one list in a feature collection)

The user would most likely then use the save to file function with this

Rubix982 commented 3 years ago

Noted.

Rubix982 commented 3 years ago

@cbeddow help wanted over the 5th step. I remember we eliminated features based on the geometry attribute by using the haversine package that basically got us features only that lay in a specific radius. Is this the same thing as that?

cbeddow commented 3 years ago

@Rubix982 aha, this is where we may want to try and use Shapely (if it has no other issues as a requirement installing on Windows, should be okay though).

See answer here: https://stackoverflow.com/questions/36399381/whats-the-fastest-way-of-checking-if-a-point-is-inside-a-polygon-in-python#36400130

We want to loop through all points in the returned bbox, and see if they fall within the polygon of the geometry. So different from haversine which checks a radius (basically if point is within a circle shaped polygon). This time we need to take the input from the user, and take its coordinates.

So in a geojson:

from shapely.geometry import shape

input_image = {
  "id": "1933525276802129",
  "geometry": {
    "type": "Point",
    "coordinates": [
      -97.743279722222,
      30.270651388889
    ]
  }
}

input_geojson = {
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              7.2564697265625,
              43.69716905314008
            ],
            [
              7.27020263671875,
              43.69419030566581
            ],
            [
              7.287025451660155,
              43.69419030566581
            ],
            [
              7.296295166015625,
              43.702133303447326
            ],
            [
              7.279129028320312,
              43.71652730498859
            ],
            [
              7.255439758300781,
              43.710819752457624
            ],
            [
              7.2564697265625,
              43.69716905314008
            ]
          ]
        ]
      }
    }
  ]
}

polygon = shape(input_geojson["geometry"])
point = shape(input_image["geometry"])
if polygon.contains(point):
    print("Match"!)
    # keep the image if this is true, exclude if not

More here: shapely.readthedocs.io/en/stable/manual.html

There is an even more efficient way in the docs above, search for "STR-packed R-tree" which does kind of a list comprehension. Let me know if you want to try that and need help after maybe testing the speed of the first method in the example I gave!

Rubix982 commented 3 years ago

@cbeddow you mentioned as the first step that "the input geojson or bbox becomes the query extent.", but if are going to use a bbox hereto filter out getting images in that shape, how would this be different from #18 ?

cbeddow commented 3 years ago

@Rubix982 there are two key differences:

1) first step is receive the shape, and estimate a bbox that contains it 2) last step is to check using shapely if a shape contains the point data (image)

In a normal bbox function, the last step will be to check if the bbox contains the image points. But in this case the bbox is overestimation of the size, so we need to check the specific shape.

For example, in this image, there are many grey points. We want only the points inside the blue shape. But we first must estimate the red bbox to then ask mercantile to find a tile of tiles it intersects, download the data from those tiles, then keep only the points from those downloaded tiles which fall inside (are contained by) the blue shape, instead of just anything in the red bbox as we would do in a get_all_images_in_bbox type function

image

Rubix982 commented 3 years ago

Ah! This was very helpful, thank you, @cbeddow.

Rubix982 commented 3 years ago

Success!

image image
image image

The 1st row is a shape (Polygon) passed in and successfully retrieves all the image features.

The 2nd is a GeoJSON as input, returning another GeoJSON as output with points for image features.

cbeddow commented 3 years ago

Great work, looks like it gets all the data very densely. A good way to test as well is to install QGIS from https://qgis.org and you can drop the geojson file in there to view larger amounts of data, like a 10gb file :D