Open twhetzel opened 2 days ago
"Killed" often does not mean that the process was out of memory - it usually means that the process was trying to allocate more memory than was available to docker. This could have different reasons, for example, if docker has already allocated some memory. For example:
I would restart docker, make sure your docker has 80GB assigned, your process 60GB, and no other process is running.
If this does not work we will have to work with Kevin to create a NCBIgene slim during the ncbigene ingest pipeline which is suitable for ontologies (easy enough, but lets first try the above).
I re-ran the refresh-merged
goal after restarting Docker and set to 80GB and ran with `export "MEMORY_GB=60" with no other processes running. After about 40 min it failed again, same error as before:
remove --term rdfs:label --term IAO:0000115 --term IAO:0000116 --term IAO:0100001 --term owl:deprecated -T imports/merged_terms_combined.txt --select complement --select "annotation-properties" \
query --update ../sparql/inject-subset-declaration.ru --update ../sparql/inject-synonymtype-declaration.ru --update ../sparql/postprocess-module.ru \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo/imports/merged_import.owl annotate -V http://purl.obolibrary.org/obo/mondo/releases/2024-11-06/imports/merged_import.owl --annotation owl:versionInfo 2024-11-06 convert -f ofn --output imports/merged_import.owl.tmp.owl && mv imports/merged_import.owl.tmp.owl imports/merged_import.owl; fi
Killed
make[1]: *** [Makefile:448: imports/merged_import.owl] Error 137
rm imports/foodon_terms.txt imports/omo_terms.txt imports/ncbigene_terms.txt imports/envo_terms.txt imports/ncit_terms.txt
make[1]: Leaving directory '/work/src/ontology'
make: *** [Makefile:481: refresh-merged] Error 2
My Docker setting
@matentzn any other suggestions? If not, how do we get the alternative underway "have to work with Kevin to create a NCBIgene slim during the ncbigene ingest pipeline which is suitable for ontologies"? It would be great if this can be done by Week 3 (Nov. 18) of the Mondo Release Cycle SOP so I can refresh the imports as part of the SOP on Friday, Nov.22.
@kevinschaper can you help with this? Would it be possible, given a set of taxon ids (and gene ids) to efficiently subset the ncbigebe ingest before it makes is way into the Mondo pipeline?
Ooh, we already filter by taxon, but we could absolutely make an additional rdf file that starts from the original and filters down to a subset of genes. I think we’d subset the primary tsv output to just the genes, then use kgx to produce rdf from that.
Awesome, thanks @kevinschaper! Let me know what I need to do here once that is ready.
@twhetzel how should I get the gene list out of mondo?
Hmm, I can point you to the properties that are used in Mondo for the genes, but I'm confused the overall process here since we run the refresh-merged
goal in order to get the genes into Mondo. Will what you're thinking still work to get new genes etc. into Mondo? FWIW, this is the process we use https://mondo.readthedocs.io/en/latest/editors-guide/import-terms-for-logical-axioms/
Hmmm is there no other way then to do the module extraction so early? No @twhetzel this won't work... I guess the problem is that we have requested all genes for all TACA for which even a single disease is mentioned.. maybe that is not needed, and we can provide a much smaller list of genes?
Alternatively we have to do some preprocessing of the file with something other than robot which is more m more efficient..
What about pulling the tsv from the ncbi ingest, filtering it to a subset of rows, and then using kgx to convert that little kgx tsv to rdf?
That could work, yes; I assume there is no KGX filter command I can use? We need to provide a small custom Python script?
To help me track this, who will try this next option for "pulling the tsv from the ncbi ingest, filtering it to a subset of rows, and then using kgx to convert that little kgx tsv to rdf"? @matentzn or @kevinschaper or me (given a few more pointers here on what to do)?
I ran into this in October and was able to solve by setting the memory to 60GB, however this time running
sh run.sh make refresh-merge
(following these docs: https://mondo.readthedocs.io/en/latest/editors-guide/import-terms-for-logical-axioms/), I am still getting this error.We need this process to work as part of the general SOP for the Mondo release cycle as well as regular curation for adding entities into Mondo such as genes.