oeg-upm / mapeathor

Translator of spreadsheet mappings into R2RML, RML or YARRRML
https://morph.oeg.fi.upm.es/tool/mapeathor
Apache License 2.0
30 stars 10 forks source link

Handling missing data #49

Closed nleguillarme closed 9 months ago

nleguillarme commented 9 months ago

I am currently working on transforming species interactions data from csv to rdf. My data looks like this:

consumerID resourceID resourceTaxonID
NCBI:211278 SFWO:0000464 nan
NCBI:211278 nan GBIF:68

From this data, I'd like to generate triples of the form: consumer member_of [rdf:type consumerID] consumer eats resource with resource member_of [rdf:type resourceTaxonID] OR resource rdf:type resourceID depending on whether the field resourceID or resourceTaxonID is not nan for the resource.

To do that, I thought maybe I could generate an individual resourceAsTaxon or an individual resourceAsMaterial in the Subject tab, depending on whether the field resourceID or resourceTaxonID is set. That would look like something like this:

The data ID1 consumerID ID2 resourceID ID3 resourceTaxonID
1 NCBI:211278 1 SFWO:0000464 nan nan
2 NCBI:211278 nan nan 2 GBIF:68
The mapping ID Class URI
CONSUMER obo:CARO_0001010 consumer_{ID1}
RESOURCEASMATERIAL obo:BFO_0000040 resourceAsMaterial_{ID2}
RESOURCEASTAXON obo:BFO_0000040 resourceAsTaxon_{ID3}

but it seems that missing data are generating errors :

INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/__main__.py", line 3, in <module>
INFO -     mapeathor.main()
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/__init__.py", line 43, in main
INFO -     outputFile = mapping_generator.generateMapping(inputFile, args.output_file)
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 294, in generateMapping
INFO -     json = organizeJson(json)
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 54, in organizeJson
INFO -     json['TriplesMap'][subject['ID']]['Source'] = reFormatSource(json['TriplesMap'][subject['ID']]['Source'])
INFO -   File "/home/***/.local/lib/python3.10/site-packages/mapeathor/mapping_generator.py", line 269, in reFormatSource
INFO -     result['ID'] = data[0]['ID']
INFO - IndexError: list index out of range

Do you have any advice?

nleguillarme commented 9 months ago

I've just realized I had an error in my spreadsheet. I will make more tests and reopen the issue if necessary.