somaliaims / Somali-AIMSUI

GNU Affero General Public License v3.0
1 stars 3 forks source link

Parsing error on IATI sectors #293

Closed matmaxgeds closed 4 years ago

matmaxgeds commented 5 years ago

See screenshot - somehow a project description has been parsed as a sector and so been imported as one.....

Screenshot from 2019-07-21 18-18-45

raashidahmad commented 5 years ago

@matmaxgeds This has been parsed correctly. The IATI Identifier for this activity is: GB-CHC-1001813-RNGSomalia. Title for this activity is: 3. Mine Action and Cluster Munitions Programme 2016 -2020

Please see some snapshots below for help:

![Uploading Long-Sector-Extraction-by-Parser.png…]() ![Uploading Project-Desc-For-Long-Sector.png…]()

raashidahmad commented 5 years ago

Closing it as parsing is okay, it is the data entered long for sector.

matmaxgeds commented 5 years ago

Thanks - I have found the source file and see that it is a problem in the IATI data. It also highlights the problem that IATI publishers are entering different narratives for the same sector code so we are ending up with many more options than we should have. We have two options, either

  1. We use the IATI codelists API or another source and get the sector codes and names from there, and only go to the narrative element if the code is missing from the codelist.
  2. We scrape first a list of the codes, and then to take the most common narrative that corresponds to them.

In both options, we should probably list them as 11100 Education ie. the code, and then the name in english, not the description, but the name of the 5 digit code.

I think option 1 is probably quickest at this stage. And the easiest source is here: https://datahub.io/core/dac-and-crs-code-lists/r/14.csv - or alternatively here: https://webfs.oecd.org/crs-iati-xml/Lookup/DAC-CRS-CODES.xml but that one needs a bit more parsing, or here is another source: https://github.com/IATI/IATI-Codelists-NonEmbedded/blob/master/xml/Sector.xml, or here is another source, but it is about to change so the domain would need to be editable: https://test-datastore.iatistandard.org/api/codelists/Sector/?format=json

Here is a good one for the SDG Goals: https://github.com/IATI/IATI-Codelists-NonEmbedded/blob/master/xml/UNSDG-Goals.xml and the targets: https://github.com/IATI/IATI-Codelists-NonEmbedded/blob/master/xml/UNSDG-Targets.xml

There will need to be a management page for this.

Whatever we do for the OECD DAC codes, we should do for all the other codes.

matmaxgeds commented 5 years ago

@raashidahmad as we have some extra time, we really need to solve this before the launch:

matmaxgeds commented 4 years ago

closing as now duplicated by #163