stuckyb / ontopilot

15 stars 2 forks source link

How to extract single terms from a large ontology #93

Open ramonawalls opened 5 years ago

ramonawalls commented 5 years ago

When you just need a few terms from a very large ontology, it does not make sense to pull the whole ontology into the repo. This is true especially of NCBI taxon. Is there a strategy to just make an import module with a few terms without having to download the whole ontology?

stuckyb commented 5 years ago

So, a couple of comments. First, when you build an ontology, OntoPilot will pull any external ontologies into the build folder. I recommend explicitly not adding that folder to the repo, since it only contains build artifacts and nothing of interest beyond what is in the rest of the repo. I usually add build to the repo's .gitignore file to make this explicit.

Second, I agree it would be nifty to be able to get terms from a remote ontology without downloading the whole thing. Do you know if there is a good service for doing this? For the general problem of import module extraction, you need to be able to inspect logical axioms and their semantic context, which precludes simple download strategies. For single-term imports, though, just grabbing a relevant OWL snippet would be good enough.

ramonawalls commented 5 years ago

Good idea for the git ignore, but I think it was still timing out when pulling in NCBI taxon. Maybe there are subsets of that available somewhere. I'll look.

I was hoping the OWL API could pull single terms. It looks like it should be possible with the OLS API (https://www.ebi.ac.uk/ols/docs/api - scroll down to TERMS), with the iri parameter.

stuckyb commented 5 years ago

One could use the OWL API to pull single terms, but since it is a low-level software development API and not a web API, it still requires access to the full ontology document.

The OLS API looks like it could be promising, but it is unfortunate that it can't return content as OWL/RDF snippets (or any format parsable by the OWL API).

So out of curiosity, are you seeing failure when attempting to download the NCBI taxon ontology, or after it is downloaded, during term extraction? If the latter, I suspect the problem might be memory limitations in the Java runtime environment, which can be adjusted when running OntoPilot. Let me know if you'd like any help in trying to solve that.

On a different note, I really wish they would modularize the NCBI taxon ontology. It is so huge as to be practically useless in many applications (e.g., it requires multiple GB of RAM just to parse).

ramonawalls commented 5 years ago

Realized I never answered your question above. I get the error while trying to download the NCBItaxon ontology, not during term extraction.