thinkingmachines / unicef-ai4d-research-bank

UNICEF AI4D Research Bank - A website that provides user with accessibility to open source datasets, models, and scripts for countries across Southeast Asia (SEA)
https://thinkingmachines.github.io/unicef-ai4d-research-bank/
MIT License
3 stars 8 forks source link

Enforce consistent data formats (HTX HXL tagging for CSV) #39

Open ghost opened 1 year ago

AnthonyMockler commented 1 year ago

https://data.humdata.org/tools/hxl-example/

AnthonyMockler commented 1 year ago

Can also do this for JSON formatted https://hxlstandard.org/standard/1-1final/tagging/#tagging.json

butchtm commented 1 year ago

Hi @AnthonyMockler we might need to revisit this as a validation condition as this will prevent the current poverty mapping and air quality datasets from being added to the research bank as they don't currently implement HXL tags.

As an alternative, we can probably add a warning instead and a message to encourage the data provider to apply hxl tags to the datasets they are contributing.

AnthonyMockler commented 1 year ago

@butchtm Yep happy to talk - Is there a link to the output csvs of the current air quality / poverty metrics?

(I still think it's a good criterion, and we should explore / estimate what it would take to automate 'fixing' of incoming data that doesn't already have HXL tags)

butchtm commented 1 year ago

linking the air quality repo and output csvs (on google drive):

butchtm commented 1 year ago

Hi @AnthonyMockler After further review of the hxl tagging json recommendations, it seems that the geojson format is not currently compatible with the guidelines. There was some discussion in the hxlstandard google groups way back in 2018 but no further developments have been announced.

So the final recommendation is not to require geojson files to have HXL tags, since there is no current way to add them without invalidating the geojson format.