mapillary / mapillary-python-sdk

A Python 3 library built on the Mapillary API v4 to facilitate retrieving and working with Mapillary data.
MIT License
37 stars 14 forks source link

Bug Report #138

Closed ionut-mntn closed 2 years ago

ionut-mntn commented 2 years ago

Describe the bug KeyError: 'value' exception during call to traffic_signs_in_bbox method if one of the tiles does not have traffic signs in it, meaning that: results from "good, relevant"(containing traffic signs) tiles in the given bbox are not returned if one of the tiles is "bad" (having no traffic sign / containing only water / maybe other scenarios, too; )

To Reproduce

### Test Monaco with SDK
import json
import requests
import datetime
import mercantile
import pandas as pd
import mapillary as mly

from dateutil.relativedelta import relativedelta

### Set access token
ACCESS_TOKEN = 'XXX'
mly.interface.set_access_token(ACCESS_TOKEN)
PATH = f'MAs_Munich_task/us_west_coast_tried_stuff' #ignore this

### Read relevant traffic signs' names
traffic_signs_names_path = '/app/MAs_Munich_task/values.csv' # all traffic signs present on https://www.mapillary.com/developer/api-documentation/traffic-signs! (around 1500 or 1600 signs) Could a too large list be problematic? Don't think so...
traffic_sign_names = pd.read_csv(traffic_signs_names_path)['name'].tolist()

time_now = datetime.datetime.today()

### Define relevant bonuding tile
monaco_z11_btile = mercantile.Tile(x=1066,y=746,z=11)
monaco_z12_btile = mercantile.Tile(x=2132,y=1493,z=12)
monaco_z_btile = monaco_z12_btile
w,s,e,n = mercantile.bounds(monaco_z_btile) # get bbox coordinates of the vector tile

# make sure coordinates are inside the vector tile, not on the edge! Otherwise, this could result in requests to "one-zoom-level-lower (meaning higher altitude, ie: covering more area) tiles", which are actually not part of our chosen vector tile(bbox)!
w += 0.01
s += 0.01
e -= 0.01
n -= 0.01

res = mly.interface.traffic_signs_in_bbox(
bbox = {
        'west': w,
        'south': s,
        'east': e,
        'north': n
    },
    filter_values=traffic_sign_names,
    existed_at=str(time_now - relativedelta(years=1)).split('.')[0],
    existed_before=str(time_now).split('.')[0]
)

### Dump file for better visualisation
json_res = json.loads(res)
json.dump(json_res,open(f'{PATH}/monaco_{monaco_z_btile.z}_{monaco_z_btile.x}_{monaco_z_btile.y}.json','w+'), indent=4)

Expected behavior I am expecting the method I am using from the SDK to make calls looking like: https://tiles.mapillary.com/maps/vtp/mly_map_feature_traffic_sign/2/{z}/{x}/{y}?access_token=XXX and then give me a JSON response back with the traffic signs (all traffic signs) in the bbox that I pass the traffic_signs_in_bbox method. It should make 4 ^ (14 - zoom_level_provided_tile) calls (but as I mentioned above, were one of these calls to be fail, no reponse is returned).

Screenshots This is the output I get for monaco_z_btile = monaco_z12_btile. Note that whenever one of the tiles is not relevant (not sure, but I would say that in my case the problematic tile -14,8530,5975- does not contain traffic signs, because it is all water) I do not get anything back, not even a partial response. It just throws an error

Screenshot 2022-03-22 at 11 12 01

Additional context 1) Note that I used bboxfinder.com for the manually-defined Monaco bounding tiles. I could be more professional and use OSM relation ID together with https://polygons.openstreetmap.fr/ to get precise bbox, and afterwards the vector tiles (that you see I defined manually); but for the sake of this example, I just hardcoded them.

2) Note that -in theory- you could assign the monaco_z_btile variable a mercantile.Tile object with an arbitrary zoom level (that includes Monaco, obviously; so you can reproduce the issue). I only provided the coordinates for the zoom_11, zoom_12 tiles, but others could be easily identified using bboxfinder.com.

3) UPDATE: I just tested for another region for which most of the corresponding tiles do not contain only water and this issue seems to reproduce for this scenario, too. Most of them produce the same KeyError: 'value' exception. I am iterating over the zoom_12 tiles for US West Coast, which means this bbox: (-125.024414,31.128199,-108.896484,49.152970) and calling the traffic_signs_in_bbox with the tiles' corresponding bbox coordinates. NOTE: As mentioned in a comment in the code above, I took into account to also make these bbox coordinates smaller, so that they are not exactly on the edge of the bbox. Again, the reason for this is that I noticed the traffic_signs_in_bbox method will make calls to "brother" tiles (tiles having the same parent) if the bbox I pass the method is exactly on the edge of the vector tile - I am not sure most of the users would want that, maybe this is another issue, too. If unclear, I could open another issue to demonstrate what I am trying to say.

Rubix982 commented 2 years ago

Thank you for the issue, @ionut-mntn. Will be looking at this today and will get back to you.

ionut-mntn commented 2 years ago

I could provide an extensive list of tiles (of zoom 12) that produce this issue, if that would help.

Rubix982 commented 2 years ago

Please do share. I am thinking of adding a that for this specific use case. More tiles could help in validating changes.

On Tue, Mar 22, 2022, 6:23 PM Ionut Muntean @.***> wrote:

I could provide you with an extensive list of tiles (of zoom 12) that produce this issue, if that would help.

— Reply to this email directly, view it on GitHub https://github.com/mapillary/mapillary-python-sdk/issues/138#issuecomment-1075174750, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ5U7NXJUD2IFQVSG6QFTNTVBHCUHANCNFSM5RKIDKFA . You are receiving this because you commented.Message ID: @.***>

ionut-mntn commented 2 years ago

@Rubix982 At least the first 28232 tiles in this file are problematic when it comes to calling the traffic_signs_in_bbox method of the SDK. As for the other ~half, I'd intuitively say most of them are problematic too; for those, I might have exceeded the rate limit by mistake. I hope this helps! Please let me know your progress, if possible. Thank you!

bad_tiles_west_coast_run.txt

Rubix982 commented 2 years ago

Awesome! I'll be writing some tests then to throw appropriate errors or return the right results as per requested. Do give me sometime, I'm over-swapped by a couple of things, but this issue is at the top of my mind, and I'll hopefully get it done soon.

Rubix982 commented 2 years ago

Hi! Just got done with an event at University. Finally back.

@ionut-mntn can you share `'/app/MAs_Munich_task/values.csv' if it's okay? I'm writing a test for this and this could be great to test against.

ionut-mntn commented 2 years ago

@Rubix982 hi again! Sorry for the delay. Here's the csv: values.csv It should contain all the traffic signs listed in the Mapillary documentation, which you cand find here: https://www.mapillary.com/developer/api-documentation/traffic-signs.

Also, -sry for bothering, but- when possible please let me know what's your progress; did you manage to return the relevant tiles, without the code breaking if one of the tiles intersecting the region is "bad" and now you are testing it? It would be very useful if you did, because I had to find an alternative to that which is very tedious. Or are you just throwing an appropiate exception, because the other solution would have been too big of an effort? Thank you!

Rubix982 commented 2 years ago

Got it! I will check it out. Again, I am swamped by work. Even right now, I have commitments for the next 3-4 hours. :/

I do have this one my priority checklist, though.

ionut-mntn commented 2 years ago

@Rubix982 ok, thank you! No rush. Whenever possible, chief. I appreciate your efforts! Just wanted to see your status. Good luck, God bless!

Rubix982 commented 2 years ago

About my approach, I am testing it against the data that have provided me so I can better deal with it. Throwing an exception and not returning anything renders the point of this SDK moot.

Rubix982 commented 2 years ago

I think I've found the issue and currently resolved it, @ionut-mntn. I will test with more tiles and see if it gives the expected answer.

ionut-mntn commented 2 years ago

@Rubix982 thanks very much for letting me know! Good luck!

Rubix982 commented 2 years ago

@ionut-mntn can you install the latest version of mapillary, with pip install mapillary=1.0.3? We just merged to main a change. I tested your script against the latest changes and the output is an expected GeoJSON.

The code I am using (yours slightly modified),

### Test Monaco with SDK
import json
import datetime
import mercantile
import pandas as pd
import mapillary as mly

from dateutil.relativedelta import relativedelta

### Set access token
ACCESS_TOKEN = 'XXX'
mly.interface.set_access_token(ACCESS_TOKEN)

### Read relevant traffic signs' names
traffic_signs_names_path = 'traffic_signs_filter_values.csv' # all traffic signs present on https://www.mapillary.com/developer/api-documentation/traffic-signs! (around 1500 or 1600 signs) Could a too large list be problematic? Don't think so...
traffic_sign_names = pd.read_csv(traffic_signs_names_path)['name'].tolist()

time_now = datetime.datetime.today()

### Define relevant bonuding tile
monaco_z11_btile = mercantile.Tile(x=1066,y=746,z=11)
monaco_z12_btile = mercantile.Tile(x=2132,y=1493,z=12)
monaco_z_btile = monaco_z12_btile
w,s,e,n = mercantile.bounds(monaco_z_btile) # get bbox coordinates of the vector tile

# make sure coordinates are inside the vector tile, not on the edge! Otherwise, this could result in requests to "one-zoom-level-lower (meaning higher altitude, ie: covering more area) tiles", which are actually not part of our chosen vector tile(bbox)!
w += 0.01
s += 0.01
e -= 0.01
n -= 0.01

res = mly.interface.traffic_signs_in_bbox(
bbox = {
        'west': w,
        'south': s,
        'east': e,
        'north': n
    },
    filter_values=traffic_sign_names,
    existed_at=str(time_now - relativedelta(years=1)).split('.')[0],
    existed_before=str(time_now).split('.')[0]
)

### Dump file for better visualization
json_res = json.loads(res)
json.dump(json_res,open(f'{monaco_z_btile.z}_{monaco_z_btile.x}_{monaco_z_btile.y}.json','w+'), indent=4)

Here is the CSV file,

traffic_signs_filter_values.csv

The result is the attached GeoJSON - rename from .TXT to .JSON because GitHub will not let me upload .JSON files.

12_2132_1493.txt

Here is the resulting geojson on https://geojson.io,

Screenshot_63

If you have everything working as expected and you do not get this error anymore, please let me know so I can close this issue.

Thank you for your patience! Really appreciate it. :smile:

Rubix982 commented 2 years ago

Also, for testing's sake, we made a .txt file under tests/data called bad_tiles_west_coast_run.txt that has tile data that might possibly break something if edge cases are not handled.

You can find these tiles here.

If you know of other such tiles that we should have a look out for, or that might break something pre-existing codebase, or if you come across tiles that introduce exceptions being thrown, do let us know so we can see how to cater to those edge cases.

Thank you.

ionut-mntn commented 2 years ago

@Rubix982 thank you very much! Will try to take a look at the new behaviour when possible! thank you for your patience! - no, thank you!! : )) I really appreciate your efforts!

Rubix982 commented 2 years ago

@ionut-mntn were you able to test this?

ionut-mntn commented 2 years ago

@Rubix982 not yet, sorry! Was really caught up with something else. Will do & will let you know!

ionut-mntn commented 2 years ago

Hello! Sorry for the delay! I tested again, but I still get the error. (Shouldn't this method also work for large regions?) Code to reproduce:

import os
import json
import requests
import datetime
import mercantile

import pandas as pd
import mapillary.interface as mly

from dateutil.relativedelta import relativedelta

time_now = datetime.datetime.today()

ACCESS_TOKEN = 'XXX'
mly.set_access_token(ACCESS_TOKEN)

traffic_signs_names_path = '/path/to/csv_containing_all_possible_traffic_sign_names.csv' # same `values.csv` file as mentioned previously
traffic_sign_names = pd.read_csv(traffic_signs_names_path)['name'].tolist()

turn_traffic_sign_names = [tfn for tfn in traffic_sign_names if 'turn' in tfn]

florida_bbox = mercantile.LngLatBbox(-87.634903,24.396307,-79.974304,31.000969) 
# print(*florida_bbox)
# found here: 
# https://gist.github.com/Duder-onomy/2bdc789c3711d2e8364cbdb219db8bf8

w, s, e, n = florida_bbox

# make sure coordinates are inside the vector tile, not on the edge! Otherwise, this could result in requests to "one-zoom-level-lower (meaning higher altitude, ie: covering more area) tiles", which are actually not part of our chosen vector tile(bbox)!
w += 0.01
s += 0.01
e -= 0.01
n -= 0.01

res = mly.traffic_signs_in_bbox(
bbox = {
        'west': w,
        'south': s,
        'east': e,
        'north': n
    },
    filter_values=turn_traffic_sign_names,
    existed_at=str(time_now - relativedelta(years=1)).split('.')[0],
    existed_before=str(time_now).split('.')[0]
)

Error I get:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_1805/81934867.py in <module>
     14     filter_values=turn_traffic_sign_names,
     15     existed_at=str(time_now - relativedelta(years=1)).split('.')[0],
---> 16     existed_before=str(time_now).split('.')[0]
     17 )

/usr/local/lib/python3.7/dist-packages/mapillary/utils/auth.py in wrapper(*args, **kwargs)
     57 
     58             # Return function called with arguments
---> 59             return f(*args, **kwargs)
     60 
     61         # Return wrapper

/usr/local/lib/python3.7/dist-packages/mapillary/interface.py in traffic_signs_in_bbox(bbox, filter_values, **filters)
    536 
    537     return feature.get_map_features_in_bbox_controller(
--> 538         bbox=bbox, filters=filters, filter_values=filter_values, layer="traffic_signs"
    539     )
    540 

/usr/local/lib/python3.7/dist-packages/mapillary/controller/feature.py in get_map_features_in_bbox_controller(bbox, filter_values, filters, layer)
    150                     }
    151                     if filters["existed_before"] is not None
--> 152                     else {},
    153                 ],
    154             )

/usr/local/lib/python3.7/dist-packages/mapillary/utils/filter.py in pipeline(data, components)
    137             "exception thrown",
    138             # Except the filter name, select the rest as args
--> 139             args=tuple(list(component.values())[1:]),
    140         )
    141 

/usr/local/lib/python3.7/dist-packages/mapillary/utils/filter.py in pipeline_component(func, data, exception_message, args)
     53 
     54     try:
---> 55         return func(data, *args)
     56     except TypeError as exception:
     57         logger.warning(f"{exception_message}, {exception}. Arguments passed, {args}")

/usr/local/lib/python3.7/dist-packages/mapillary/utils/filter.py in filter_values(data, values, property)
    265     """
    266 
--> 267     return [feature for feature in data if feature["properties"][property] in values]
    268 
    269 

/usr/local/lib/python3.7/dist-packages/mapillary/utils/filter.py in <listcomp>(.0)
    265     """
    266 
--> 267     return [feature for feature in data if feature["properties"][property] in values]
    268 
    269 

KeyError: 'value'
Rubix982 commented 2 years ago

So this line,

return [feature for feature in data if feature["properties"][property] in values]

References this line in src/mapilllary/utils/filter.py.

The function in question is,

def filter_values(data: list, values: list, property: str = "value") -> list:
    """
    Filter the features based on the existence of a specified value
    in one of the property.
    *TODO*: Need documentation that lists the 'values', specifically, it refers to 'value'
    *TODO*: under 'Detection', and 'Map feature', related to issue #65
    :param data: The data to be filtered
    :type data: dict
    :param values: A list of values to filter by
    :type values: list
    :param property: The specific parameter to look into
    :type property: str
    :return: A feature list
    :rtype: dict
    """

    return [
        feature for feature in data if feature["properties"].get(property) in values
    ]

So there is no such line as,

return [feature for feature in data if feature["properties"][property] in values]

Here, instead there is the .get(property) here.

Can you try removing and reinstalling the library? Maybe you have an older version?

The file filter.py is this. Jump to filter_values.

ionut-mntn commented 2 years ago

@Rubix982 that seems to have been the problem! Thank you, and sorry for my inattention! I didn't run tests as exhaustive as the first time, but the method seems to be working fine now.

Also, going back a few replies: what did you mean by "If you know of other such tiles that we should have a look out for, or that might break some pre-existing codebase, or if you come across tiles that introduce exceptions being thrown"? Isn't the fix logic something that's abstracted from the contents of the tile? Meaning: regardless of whether a tile contains information about traffic signs or not, the method should not break (following the fix)?

Rubix982 commented 2 years ago

It should hopefully not break now, but I also added a bad_tiles.txt file in the tests to check against those to make sure I catch edge cases, like this one we did not account for or detect. I think it should be good for now!

ionut-mntn commented 2 years ago

@Rubix982 ok, thank you very much again! This is my first issue ever so I do not know what the flow is and what should happen now, following the issue fix. Should I close the ticket or is it the issue resolver (you) who needs to closes it?

Rubix982 commented 2 years ago

@Rubix982 ok, thank you very much again! This is my first issue ever so I do not know what the flow is and what should happen now, following the issue fix. Should I close the ticket or is it the issue resolver (you) who needs to closes it?

No worries! Welcome to Open Source! If you feel your issue has been resolved, please feel free to press "Close Issue".

It depends on repository to repository, but we have here where people opening the issue close it for closure's sake.

ionut-mntn commented 2 years ago

@Rubix982 ok, thanks! Closing now.