Closed macieg closed 4 years ago
@iandees It's not about columns in csv files - it's about nodes in xml files like described here - https://github.com/openaddresses/machine/issues/766
@iandees to be more specific - Some time ago I've updated the cache with polish addresses, because it was outdated. Now I'd like to get rid of that cache and use frequently updated source of data.
It requires adding this additional attribute method. Apart from this PR I'll need also to make changes in the main repository and documentation.
Maybe I'm wrong, but this is a place where I should start?
I'm happy to give more detailed explanation if needed :)
I understand what the source data looks like, but I'm pretty sure that as that data works its way through our pipeline it will lose multiple values and end up with a single string, not an array of strings. This is why I had asked if you tried this change outside of the unit test.
@iandees - I've tried.
I was doing some experiments with other task
If we take a look at the resulting file, we'll see:
LON,LAT,NUMBER,STREET,UNIT,CITY,DISTRICT,REGION,POSTCODE,ID,HASH
15.9878829,54.0127764,30,Kochanowskiego,,Białogard,"['Polska', 'zachodniopomorskie', 'białogardzki', 'Białogard']","['Polska', 'zachodniopomorskie', 'białogardzki', 'Białogard']",78-200,PL.ZIPIN.1422.EMUiA_05e9f97a-c860-43ff-b7f1-e6fcd229a7c3,48b297bfd8452a34
15.9832874,54.008211,9,Ludowa,,Białogard,"['Polska', 'zachodniopomorskie', 'białogardzki', 'Białogard']","['Polska', 'zachodniopomorskie', 'białogardzki', 'Białogard']",78-200,PL.ZIPIN.1422.EMUiA_05f08ec1-6eaf-484a-aca7-6d0f69c234de,0fe4030acab1328a
15.9705208,54.0123932,14,Królowej Jadwigi,,Białogard,"['Polska', 'zachodniopomorskie', 'białogardzki', 'Białogard']","['Polska', 'zachodniopomorskie', 'białogardzki', 'Białogard']",78-200,PL.ZIPIN.1422.EMUiA_05f68238-99d8-4990-a21d-41cd2cabbf85,bf6b1980190d2eaa
16.0040907,54.012632,2,Gryfitów,,Białogard,"['Polska', 'zachodniopomorskie', 'białogardzki', 'Białogard']","['Polska', 'zachodniopomorskie', 'białogardzki', 'Białogard']",78-200,PL.ZIPIN.1422.EMUiA_060b6d7e-c2a1-401e-8eb2-eada4e0483c7,9aeb025523996d9
"['Polska', 'zachodniopomorskie', 'białogardzki', 'Białogard']"
There are quotes around, but it looks to me like it was considered as an array before printing to the file.
Am I wrong?
I haven't tried to do it on my local computer. I can do if needed. :)
I think that might be the text coming out of OGR. But let's try it and see what happens!
@iandees - you were right, it was just a string :/ Fix below, not the prettiest code :)
Have you tested this with real data? I'm surprised that we'd get multiple values for a single column back like that.