osmbe / traffic-sign-project

Project to map the effects of traffic signs
7 stars 2 forks source link

traffic_sign_description is incomplete #17

Closed joostschouppe closed 12 months ago

joostschouppe commented 1 year ago

While traffic_sign_code contains all the relevant info, the field with the sign description only contains the first sign.

joostschouppe commented 1 year ago

Sample data: export_signs.csv (this is actually a .geojson file, so change the extention). Example signs with UUID 15878c0a-1d4c-439f-992c-0bf067c4decd and e20cc4ea-7222-4979-9b41-c853d2437ad9 turn into this task https://maproulette.org/challenge/23550/task/167694358 where the description for D5 is missing. I looked at a few examples just now, and I think it turns out that these are missing (or written as D07 instead of D7) in https://github.com/osmbe/traffic-sign-project/blob/master/find-interesting-signs/road_signs.csv .

ivandiaz-tomtom commented 1 year ago

Looking into this now, I suspect there is an easy with the csv join (missing descriptions). I will try to provide a fix later this week.

ivandiaz-tomtom commented 1 year ago

I am downloading the full dataset now to perform an analysis of the data, especially the bordcode variable values that are used to join the downloaded dataset with the information in https://github.com/osmbe/traffic-sign-project/blob/master/find-interesting-signs/road_signs_cleaned.csv.

  # Join both datasets by the bordcode
  joined_df = filtered_df.join(sign_metadata.set_index("bordcode"), on='bordcode')
  # Remove NaN parameters and name
  joined_df[['parameters', 'name']] = joined_df[['parameters','name']].fillna('')

In the geojson files the bordcode is different than the ones in the csv, e.g. D5 compared with D05 hence is not able to find the match.

ivandiaz-tomtom commented 1 year ago

bordcode values present in the road_signs csv file.

['E5' 'A47' 'A13' 'A41' 'F4b' 'F50' 'E7' 'A1b' 'A14' 'C1' 'F19' 'E1' 'M4'
 'F4a' 'M2' 'A51' 'A23' 'C3' 'F45' 'F87' 'A29' 'A33' 'E3' 'B5' 'E9a' 'B1'
 'M7' 'A1a' 'F43' 'C43' 'C5' 'E9b' 'A25' 'F101c' 'M3' 'B9' 'A5' 'C21' 'F1'
 'F99c' 'C29' 'A7b' 'C35' 'B11' 'C46' 'C15' 'A7a' 'F59' 'F49' 'E9c' 'A1c'
 'F111' 'A15' 'A21' 'C11' 'E9d' 'F12b' 'A7c' 'A1d' 'C23' 'M5' 'F45b' 'C9'
 'M6' 'C45' 'E9e' 'C37' 'D10' 'F50bis' 'M11' 'A39' 'F21' 'D13' 'F12a'
 'A49' 'C27' 'A3' 'C25' 'M1' 'C7' 'M8' 'A45' 'C19' 'E9h' 'E9f' 'F5' 'A27'
 'A11' 'F9' 'E9i' 'C33' 'D11' 'A43' 'C5-C7' 'F51a' 'F103' 'A9' 'F18' 'F47'
 'C39' 'F7' 'F17' 'C5-C7-C9' 'E9g' 'A17' 'E11' 'A31']

bordcode values not present

'IV4' 'GXa' 'Type V' 'FtsRt' 'GIa' 'B19' 'F1a' 'GXb' 'G' 'IR2b' 'GXd'
 'IV3' 'VL' 'F29' 'F27-Calamiteiten' 'F34a' 'GII ' 'F35' 'ZC21'
 'XX_parkeerautomaat' 'C31b' 'B21' 'E9a-GVIId' 'XX' 'D7' 'D1d' 'F3a' 'GIV'
 'B15c' 'ZC21/' 'ZC21T' 'ZC45' 'D1f' 'GIII' 'F59b' 'TR' 'B17' 'IR1b'
 'IR11' 'F57' 'IR2c' 'IV1' 'D9a' 'C31a' 'GVIIb' 'GXc' 'F33a' 'GVIIc'
 'ZC21T/' 'ZC43' 'E9a-GVIIb' 'C9-C11' 'GVIIa' 'B15a' 'D1b' 'Hm01P' 'D1a'
 'F99a' 'F101a' 'C9-C15' 'M12' 'IV2' 'F34b2' 'D5' 'ZC43/' 'WdRt' 'F23a'
 'GVIII' 'F27' 'Tr- Kempen' 'F14' 'B15f' 'ZE9aT' 'B15g' 'F41' 'F33b' 'M10'
 'E9a?' 'GVIId' 'GVI' 'IR2d' 'IF9a' 'Cxx' 'D1c' 'Type IIIb' 'ZE9aT/' 'F13'
 'ZE9bT' 'IR1c' 'Fxx-CAM' 'XX-Straatnaam' 'ZE9bT/' 'F37' 'ZE9a/' 'M16'
 'E9x' 'M18' 'E9' 'F31' 'GIb' 'F59a' 'F101' 'B15d' 'F25R' 'IF3' 'D3a'
 'IF6' 'IR12' 'F34b1' 'ZE9a' 'B15b' 'D3b' 'Gxx' 'IR1d' 'D1e' 'IF1'
 'VL Rd-Or-Gr' 'IVMS1' 'ITRS' 'GV' 'VL - Vtg' 'Axx' 'Infobord'
 'ZE9a-GVIIbT' 'lege_paal' 'ZC43-C21' 'IG2' 'F27-ITRS' 'G7d' 'F59c' 'IV5'
 'ZE9a-GVIIb' 'ZE9a-GVIIb/' 'IM2' 'ZE1' 'M9' 'F25' 'ZC3T/' 'F3b' 'Gib'
 'F1b' 'ZE9a-GVIIa/' 'F53' 'ZE3' 'Iv1' 'F105' 'ZC23T/'
 'Reflecterende koker' 'Fxx-Spelende kinderen' 'ZE9a-GVIIbT/' 'F33c' 'D9b'
 'IR3x' 'Type Ia.1' 'ZE9b/' 'INFOBORD' 'ZE1/' 'F23c' 'GII' 'GVIIa-GVIIb'
 'E9a-GVIIc' 'wegwijzer dungelhoeffsite' 'Stadskantoor - Sociaal Huis'
 'Dungelhoeffsite r' 'Stadskantoor - Sociaal Huis r' 'ZC35/'
 'Informatiebord' 'Station' 'ZC21-GIV/' 'IR10b' 'Type Ic' 'WW' 'Afgedekt'
 'IV6' 'O9999' 'F101b' 'F97' 'ZC43T' '2' 'F77' 'ZF111' 'B15' 'IVMS7'
 'VL waarschuw vtg' 'Type Id' 'Type Iic' 'ZE9T/' 'Type Ib.a' 'F23d'
 'Type IIIa' 'ZC23' 'C7-C9' 'F4a-IV' 'Type xx' 'IF5' 'ZC21-C43'
 'ZC21-C43/' 'ZF111/' 'Type Va' 'GXc-20m' 'F34c1' 'ZE1T' 'IG' 'IR10c'
 'Tr- Kapelle' 'ZC23/' 'vf34a' 'IR3b' 'ZE1T/' 'IW6' 'ZC23T'
 'den hoek klein' 'IR3a' 'Tr- Blinkerroute' 'GVIIc-GV' 'F25R-TRS F25M'
 'Tr- Haspengouw' 'infPAgroen' 'B15e' 'ZE9f/' 'F25M' 'F27-xx' 'IF6b' 'F98'
 'F59 ' 'XX-sluikstort' 'M3bis' 'Type IIc' 'Straatnaambord' 'Fxx-Snelheid'
 'Mxx' 'IG3' 'Type Ia' 'ZC3' 'Type Xb' 'Type Ia.2' 'Type Xa'
 'F25R-TRS F25R' 'Type Xd' 'Fxx-BIN' 'F15' 'Tr- Landschapsroute'
 'Tr- Reynaertroute' 'F27-IU2' 'ZC3/' 'F27-Grenssignalisatie' 'IT1'
 'F34c2' 'IF4?' 'Fxx' 'ZE9b' 'GXI' 'pijl links' 'F45b-1'
 'XX-WelkomBijDeBoer' 'F45c' 'ZE9a-C21' 'F41x' 'XX_parkeersensor' '25km/u'
 'Type IV' 'F113' 'ZC21-E9a/' 'Type Ia.1-Ib.1' 'IR1x' 'F34b2 - GB'
 'XX-vaartminderen' 'XX-Speelplein' 'GXX' 'M17' 'F27-F43' 'BIN_'
 'ZC21-E9a' 'PM' 'ZC43-C21T' 'ZC45T' 'Type Xc' 'Type Ib.2'
 'Tr- Plantentuin' 'GIX' 'Tr- Nobelroute' 'F99b' 'ZC3-E1' 'IV7'
 'Type VIII' 'IU3a' 'IF9c' 'GVII' 'F63' 'gehuchten' 'ZC43-C21T/'
 'opgepast' 'F45a-1' 'infPAoranje' 'mobilhomes_rechts' 'IVMS2' 'F23b'
 'ZE9a-GVIIa' 'Werken' 'bord parking' 'antisluikstorten' 'ZC21TC43'
 'ZC21TC43/' 'Tr- Kastelenroute' 'Tr- Spierevalleiroute' 'Ftsrt'
 'Verkeerszuil' 'ZC45/' 'C7-C15' 'onbekend' 'Km01' 'IVMS3' 'IVMS5'
 'IVMSXX' 'VMS' 'ZE9-GVIIbT' 'ZC3T' 'Tr-' 'ZE9b-GVIIb' 'ZC21-GIV' 'E9ax'
 'ZE9a-VIIa/' 'F45 b' 'Type Ib1' 'Tr- Binkenroute' 'M14' 'Tr- Rodeland'
 'C21-5t5' 'Dxx' 'XX-BISY' 'IG1' 'Type Vb' 'Zone E2 Schardouwstraat'
 'E9a-GVIId-elek' 'IF13' 'ZE9T' 'ZE9b-GVIIb/']
ivandiaz-tomtom commented 1 year ago

@joostschouppe which ones do you think we should map? I suggest adding values to https://github.com/osmbe/traffic-sign-project/blob/master/find-interesting-signs/road_signs_cleaned.csv for the types that we want to map. Will that be ok? I cannot find references to all the different bordcodes in https://github.com/osmbe/traffic-sign-project/blob/master/find-interesting-signs/road_signs.xml

ivandiaz-tomtom commented 1 year ago

PR created to add some of the categories, https://github.com/osmbe/traffic-sign-project/pull/19. @joostschouppe I think we need to sync about the path forward.

joostschouppe commented 1 year ago

I'm not sure this is the best approach. The official data contains "nonsense", like wegwijzer dungelhoeffsite. We'll never be able to account for all of those (and they'll keep inventing new ones), nor should we. In the end, we really ony need descriptions for the signs we think are interesting (so 1 or 2 here). The root cause for having empty descriptions in the tasks is that we merged signs together. If we have an interesting sign, all the signs that are shown at that place get listed, including their description. Those other signs shown at the same location might lack a description in "road signs cleaned". This is not really a problem, though I guess we could add some descriptions for the most common ones. Where it is a real issue, is when you have things like D5, D7 etc in the official sign, when the official code is D05 etc. If these exist a lot, we could either add them on our end, or create a few rules to "fix" the official data. But I don't see these cases in your list above. Then I also noticed a few cases like ZC3. During runtime these should be converted to C3 + "this is a zone thing". It seems this rule (originally here) was not implemented?

ivandiaz-tomtom commented 1 year ago

Yeah, I think this requires a better way to understand the link between a code in the dataset and a sign in OSM. There are many rules, also rules that are not are not static e.g. signs that have different meaning based on their last value. The change was to unblock the use case and discuss a path forward.

ivandiaz-tomtom commented 1 year ago

We are missing the rule that converts bordcode that starts with a Z to a zone sign e.g. ZC3 should get the description from C3 but also add Zone to the parameters.

This was the logged present before.

DATASET ACTIVATE data.
* Z signs are not in our external data, so let's just give them an extra flag "this is a zonal thing".
if char.index(bordcode,"Z")=1 zone=1.
if char.index(bordcode,"Z")=1 bordcode=char.substr(bordcode,2,59).

The solution is to add that as a pre-filtering step, and in the creation of the parameters. We need to remove / and the end.

Codes that we don't know should be not filtered out, but up to the user to figure out what they are.

Next step, download all the signs and bucket them based on the bordcode.

ivandiaz-tomtom commented 1 year ago

After checking the code the logic to add zone is already part of the code, converting the bordcode to its value adding zone if starts with a Z, also removing /.

 feature_df['bordcode'] = feature_df.apply(lambda row: (f"{row['bordcode'][1:]} (zone)" if row['bordcode'].startswith('Z') else row['bordcode']).replace("/", ""), axis=1)

This logic will prevent from joining with the dataset, the zone needs to be added to the parameters field instead.

ivandiaz-tomtom commented 1 year ago

Analysis of the sign codes and their join with the signs csv. grouped_signs_by_code.xlsx

Top categories without match (signs without date filter). Traffic Sign Code   Count
FtsRt   54632
F34a   44120
TR   38320
GXa   35960
GIa   32277
GIV   30268
F29   28446
IR2c   25605
Type V   24540
XX   19048
GXb   18477
IR2b   17412
F34b2   15644
IV6   15176
B15f   14656
GIII   14083
IR10b   13892
GXd   13537
F33a   13439
GVIId   13106
B15a   12419
IV1   11902
F1a   11757
F3a   11527
F35   11526
F34b1   11518
B15c   11081
G   9843
ivandiaz-tomtom commented 1 year ago

PR created https://github.com/osmbe/traffic-sign-project/pull/21

ivandiaz-tomtom commented 1 year ago

New PR https://github.com/osmbe/traffic-sign-project/pull/22/files that also covers the artifact upload #20