Closed joostschouppe closed 12 months ago
Sample data: export_signs.csv (this is actually a .geojson file, so change the extention). Example signs with UUID 15878c0a-1d4c-439f-992c-0bf067c4decd and e20cc4ea-7222-4979-9b41-c853d2437ad9 turn into this task https://maproulette.org/challenge/23550/task/167694358 where the description for D5 is missing. I looked at a few examples just now, and I think it turns out that these are missing (or written as D07 instead of D7) in https://github.com/osmbe/traffic-sign-project/blob/master/find-interesting-signs/road_signs.csv .
Looking into this now, I suspect there is an easy with the csv join (missing descriptions). I will try to provide a fix later this week.
I am downloading the full dataset now to perform an analysis of the data, especially the bordcode
variable values that are used to join the downloaded dataset with the information in https://github.com/osmbe/traffic-sign-project/blob/master/find-interesting-signs/road_signs_cleaned.csv.
# Join both datasets by the bordcode
joined_df = filtered_df.join(sign_metadata.set_index("bordcode"), on='bordcode')
# Remove NaN parameters and name
joined_df[['parameters', 'name']] = joined_df[['parameters','name']].fillna('')
In the geojson files the bordcode is different than the ones in the csv, e.g. D5 compared with D05 hence is not able to find the match.
bordcode
values present in the road_signs csv file.
['E5' 'A47' 'A13' 'A41' 'F4b' 'F50' 'E7' 'A1b' 'A14' 'C1' 'F19' 'E1' 'M4'
'F4a' 'M2' 'A51' 'A23' 'C3' 'F45' 'F87' 'A29' 'A33' 'E3' 'B5' 'E9a' 'B1'
'M7' 'A1a' 'F43' 'C43' 'C5' 'E9b' 'A25' 'F101c' 'M3' 'B9' 'A5' 'C21' 'F1'
'F99c' 'C29' 'A7b' 'C35' 'B11' 'C46' 'C15' 'A7a' 'F59' 'F49' 'E9c' 'A1c'
'F111' 'A15' 'A21' 'C11' 'E9d' 'F12b' 'A7c' 'A1d' 'C23' 'M5' 'F45b' 'C9'
'M6' 'C45' 'E9e' 'C37' 'D10' 'F50bis' 'M11' 'A39' 'F21' 'D13' 'F12a'
'A49' 'C27' 'A3' 'C25' 'M1' 'C7' 'M8' 'A45' 'C19' 'E9h' 'E9f' 'F5' 'A27'
'A11' 'F9' 'E9i' 'C33' 'D11' 'A43' 'C5-C7' 'F51a' 'F103' 'A9' 'F18' 'F47'
'C39' 'F7' 'F17' 'C5-C7-C9' 'E9g' 'A17' 'E11' 'A31']
bordcode
values not present
'IV4' 'GXa' 'Type V' 'FtsRt' 'GIa' 'B19' 'F1a' 'GXb' 'G' 'IR2b' 'GXd'
'IV3' 'VL' 'F29' 'F27-Calamiteiten' 'F34a' 'GII ' 'F35' 'ZC21'
'XX_parkeerautomaat' 'C31b' 'B21' 'E9a-GVIId' 'XX' 'D7' 'D1d' 'F3a' 'GIV'
'B15c' 'ZC21/' 'ZC21T' 'ZC45' 'D1f' 'GIII' 'F59b' 'TR' 'B17' 'IR1b'
'IR11' 'F57' 'IR2c' 'IV1' 'D9a' 'C31a' 'GVIIb' 'GXc' 'F33a' 'GVIIc'
'ZC21T/' 'ZC43' 'E9a-GVIIb' 'C9-C11' 'GVIIa' 'B15a' 'D1b' 'Hm01P' 'D1a'
'F99a' 'F101a' 'C9-C15' 'M12' 'IV2' 'F34b2' 'D5' 'ZC43/' 'WdRt' 'F23a'
'GVIII' 'F27' 'Tr- Kempen' 'F14' 'B15f' 'ZE9aT' 'B15g' 'F41' 'F33b' 'M10'
'E9a?' 'GVIId' 'GVI' 'IR2d' 'IF9a' 'Cxx' 'D1c' 'Type IIIb' 'ZE9aT/' 'F13'
'ZE9bT' 'IR1c' 'Fxx-CAM' 'XX-Straatnaam' 'ZE9bT/' 'F37' 'ZE9a/' 'M16'
'E9x' 'M18' 'E9' 'F31' 'GIb' 'F59a' 'F101' 'B15d' 'F25R' 'IF3' 'D3a'
'IF6' 'IR12' 'F34b1' 'ZE9a' 'B15b' 'D3b' 'Gxx' 'IR1d' 'D1e' 'IF1'
'VL Rd-Or-Gr' 'IVMS1' 'ITRS' 'GV' 'VL - Vtg' 'Axx' 'Infobord'
'ZE9a-GVIIbT' 'lege_paal' 'ZC43-C21' 'IG2' 'F27-ITRS' 'G7d' 'F59c' 'IV5'
'ZE9a-GVIIb' 'ZE9a-GVIIb/' 'IM2' 'ZE1' 'M9' 'F25' 'ZC3T/' 'F3b' 'Gib'
'F1b' 'ZE9a-GVIIa/' 'F53' 'ZE3' 'Iv1' 'F105' 'ZC23T/'
'Reflecterende koker' 'Fxx-Spelende kinderen' 'ZE9a-GVIIbT/' 'F33c' 'D9b'
'IR3x' 'Type Ia.1' 'ZE9b/' 'INFOBORD' 'ZE1/' 'F23c' 'GII' 'GVIIa-GVIIb'
'E9a-GVIIc' 'wegwijzer dungelhoeffsite' 'Stadskantoor - Sociaal Huis'
'Dungelhoeffsite r' 'Stadskantoor - Sociaal Huis r' 'ZC35/'
'Informatiebord' 'Station' 'ZC21-GIV/' 'IR10b' 'Type Ic' 'WW' 'Afgedekt'
'IV6' 'O9999' 'F101b' 'F97' 'ZC43T' '2' 'F77' 'ZF111' 'B15' 'IVMS7'
'VL waarschuw vtg' 'Type Id' 'Type Iic' 'ZE9T/' 'Type Ib.a' 'F23d'
'Type IIIa' 'ZC23' 'C7-C9' 'F4a-IV' 'Type xx' 'IF5' 'ZC21-C43'
'ZC21-C43/' 'ZF111/' 'Type Va' 'GXc-20m' 'F34c1' 'ZE1T' 'IG' 'IR10c'
'Tr- Kapelle' 'ZC23/' 'vf34a' 'IR3b' 'ZE1T/' 'IW6' 'ZC23T'
'den hoek klein' 'IR3a' 'Tr- Blinkerroute' 'GVIIc-GV' 'F25R-TRS F25M'
'Tr- Haspengouw' 'infPAgroen' 'B15e' 'ZE9f/' 'F25M' 'F27-xx' 'IF6b' 'F98'
'F59 ' 'XX-sluikstort' 'M3bis' 'Type IIc' 'Straatnaambord' 'Fxx-Snelheid'
'Mxx' 'IG3' 'Type Ia' 'ZC3' 'Type Xb' 'Type Ia.2' 'Type Xa'
'F25R-TRS F25R' 'Type Xd' 'Fxx-BIN' 'F15' 'Tr- Landschapsroute'
'Tr- Reynaertroute' 'F27-IU2' 'ZC3/' 'F27-Grenssignalisatie' 'IT1'
'F34c2' 'IF4?' 'Fxx' 'ZE9b' 'GXI' 'pijl links' 'F45b-1'
'XX-WelkomBijDeBoer' 'F45c' 'ZE9a-C21' 'F41x' 'XX_parkeersensor' '25km/u'
'Type IV' 'F113' 'ZC21-E9a/' 'Type Ia.1-Ib.1' 'IR1x' 'F34b2 - GB'
'XX-vaartminderen' 'XX-Speelplein' 'GXX' 'M17' 'F27-F43' 'BIN_'
'ZC21-E9a' 'PM' 'ZC43-C21T' 'ZC45T' 'Type Xc' 'Type Ib.2'
'Tr- Plantentuin' 'GIX' 'Tr- Nobelroute' 'F99b' 'ZC3-E1' 'IV7'
'Type VIII' 'IU3a' 'IF9c' 'GVII' 'F63' 'gehuchten' 'ZC43-C21T/'
'opgepast' 'F45a-1' 'infPAoranje' 'mobilhomes_rechts' 'IVMS2' 'F23b'
'ZE9a-GVIIa' 'Werken' 'bord parking' 'antisluikstorten' 'ZC21TC43'
'ZC21TC43/' 'Tr- Kastelenroute' 'Tr- Spierevalleiroute' 'Ftsrt'
'Verkeerszuil' 'ZC45/' 'C7-C15' 'onbekend' 'Km01' 'IVMS3' 'IVMS5'
'IVMSXX' 'VMS' 'ZE9-GVIIbT' 'ZC3T' 'Tr-' 'ZE9b-GVIIb' 'ZC21-GIV' 'E9ax'
'ZE9a-VIIa/' 'F45 b' 'Type Ib1' 'Tr- Binkenroute' 'M14' 'Tr- Rodeland'
'C21-5t5' 'Dxx' 'XX-BISY' 'IG1' 'Type Vb' 'Zone E2 Schardouwstraat'
'E9a-GVIId-elek' 'IF13' 'ZE9T' 'ZE9b-GVIIb/']
@joostschouppe which ones do you think we should map? I suggest adding values to https://github.com/osmbe/traffic-sign-project/blob/master/find-interesting-signs/road_signs_cleaned.csv for the types that we want to map. Will that be ok?
I cannot find references to all the different bordcodes
in https://github.com/osmbe/traffic-sign-project/blob/master/find-interesting-signs/road_signs.xml
PR created to add some of the categories, https://github.com/osmbe/traffic-sign-project/pull/19. @joostschouppe I think we need to sync about the path forward.
I'm not sure this is the best approach.
The official data contains "nonsense", like wegwijzer dungelhoeffsite
. We'll never be able to account for all of those (and they'll keep inventing new ones), nor should we.
In the end, we really ony need descriptions for the signs we think are interesting (so 1 or 2 here).
The root cause for having empty descriptions in the tasks is that we merged signs together. If we have an interesting sign, all the signs that are shown at that place get listed, including their description. Those other signs shown at the same location might lack a description in "road signs cleaned". This is not really a problem, though I guess we could add some descriptions for the most common ones.
Where it is a real issue, is when you have things like D5, D7 etc in the official sign, when the official code is D05 etc. If these exist a lot, we could either add them on our end, or create a few rules to "fix" the official data. But I don't see these cases in your list above.
Then I also noticed a few cases like ZC3. During runtime these should be converted to C3 + "this is a zone thing". It seems this rule (originally here) was not implemented?
Yeah, I think this requires a better way to understand the link between a code in the dataset and a sign in OSM. There are many rules, also rules that are not are not static e.g. signs that have different meaning based on their last value. The change was to unblock the use case and discuss a path forward.
We are missing the rule that converts bordcode
that starts with a Z to a zone sign e.g. ZC3
should get the description from C3
but also add Zone to the parameters.
This was the logged present before.
DATASET ACTIVATE data.
* Z signs are not in our external data, so let's just give them an extra flag "this is a zonal thing".
if char.index(bordcode,"Z")=1 zone=1.
if char.index(bordcode,"Z")=1 bordcode=char.substr(bordcode,2,59).
The solution is to add that as a pre-filtering step, and in the creation of the parameters. We need to remove /
and the end.
Codes that we don't know should be not filtered out, but up to the user to figure out what they are.
Next step, download all the signs and bucket them based on the bordcode
.
After checking the code the logic to add zone is already part of the code, converting the bordcode
to its value adding zone if starts with a Z, also removing /.
feature_df['bordcode'] = feature_df.apply(lambda row: (f"{row['bordcode'][1:]} (zone)" if row['bordcode'].startswith('Z') else row['bordcode']).replace("/", ""), axis=1)
This logic will prevent from joining with the dataset, the zone needs to be added to the parameters field instead.
Analysis of the sign codes and their join with the signs csv. grouped_signs_by_code.xlsx
Top categories without match (signs without date filter). Traffic Sign Code | Count | |
---|---|---|
FtsRt | 54632 | |
F34a | 44120 | |
TR | 38320 | |
GXa | 35960 | |
GIa | 32277 | |
GIV | 30268 | |
F29 | 28446 | |
IR2c | 25605 | |
Type V | 24540 | |
XX | 19048 | |
GXb | 18477 | |
IR2b | 17412 | |
F34b2 | 15644 | |
IV6 | 15176 | |
B15f | 14656 | |
GIII | 14083 | |
IR10b | 13892 | |
GXd | 13537 | |
F33a | 13439 | |
GVIId | 13106 | |
B15a | 12419 | |
IV1 | 11902 | |
F1a | 11757 | |
F3a | 11527 | |
F35 | 11526 | |
F34b1 | 11518 | |
B15c | 11081 | |
G | 9843 |
New PR https://github.com/osmbe/traffic-sign-project/pull/22/files that also covers the artifact upload #20
While traffic_sign_code contains all the relevant info, the field with the sign description only contains the first sign.