Closed shawntanzk closed 2 years ago
Guessing new UBERON mirror has some changes and something that is being extracted in terms.txt is missing?
just looked at terms.txt - don't understand this part:
it is taking all terms for sources_merged which is all the allen files, and extracting terms from it - these are all dhba, mba etc. terms then we try to use terms.txt to extract from uberon which does not have these terms? Did uberon used to merge these files in? Is that why it is is uberon?
@dosumis - might need help with this
Looking at tmp.owl -> is the aim of uberon_slice to extract out uberon terms in sources_merged.owl? In which case, the sparql query terms.sparql
should have a filter to grab only uberon terms?
tested the above and came up with the same issue
@shawntanzk I can help you with this as well.
Generally, the seeds we use for extraction contain all classes in our ontologies, regardless of where they should be imported from. But of course, most of the classes will be ignored by robot extract - it will only import those classes that are actually present in the mirror.
What issue are you trying to address here?
ok we found the problem in that the terms file has <> (eg http://purl.obolibrary.org/obo/UBERON_0001966) robot extract doesn't seem too happy about this, cause when i remove < > from the terms.txt manually it works. I currently have a filter to only take out uberon terms but i guess that isn't needed given extract ignores classes that isn't in. Will just add a command to remove the < >, ugur is helping me with it now :) thanks
Consider how ODK does it:
$(PRESEED): $(SRCMERGED) $(ROBOT) query -f csv -i $< --query ../sparql/terms.sparql $@.tmp &&\ cat $@.tmp | sort | uniq > $@
Using the -f csv
parameter!
perfect, that saves the sed command :) thanks @matentzn btws is this a bug that it can't deal with tsv? or like is it meant to be that way
next error:
File "/usr/local/lib/python3.8/dist-packages/pandas/io/excel/_openpyxl.py", line 48, in __init__
from openpyxl.workbook import Workbook
ModuleNotFoundError: No module named 'openpyxl'
make: *** [Makefile:98: report.xlsx] Error 1
Guess that is just not on ODK container
Yes, if you want to run this inside of ODK (@dosumis did not when he wrote the Makefile), then you need to install that dependency. This is how you can do that:
https://github.com/OBOFoundry/COB/blob/master/src/ontology/cob.Makefile#L86
But it is annoying, I grant you that. I would exclude the report.xlsx from the make all
probably for now, and then ask David if he stills needs it. If so, add a goal that installs the dependencies like in the example above and run it just before, i.e.
sh run.sh make dependencies all
report is important, I figured I'll just run on my local machine, but I guess its better to add the dependencies in to the makefile so others can run it
yay full run! with a lot of help from a lot of people lol. Looks like a lot of diff though, will look through a bit to see if it is just rearrangement or something more that we should look into.
Current error: