ncbo / ncbo_cron

Jobs that run on a regular basis in the NCBO infrastructure
Other
2 stars 6 forks source link

Add a script to pull a new version of an ontology on demand #59

Closed syphax-bouazzouni closed 1 year ago

syphax-bouazzouni commented 2 years ago

Issue

There is only one where the testing if a new file exists, is in the pull_location CRON job . Which is done once a day.

The problem is that if we do an ontology reprocess with the ncbo_ontolgy_process script, it will not test the existence of a new version.

The solutions

There are two ways to solve this

  1. The first and simple one is to just add a script called ncbo_ontology_pull to do the pull on demand.
  2. The second more complex is to add in the submission process workflow a step that comes before the generate_rdf step called do_pull_location that will download and create a new submission if a new version is found.

This PR is the implementation of the first proposition.

How to use

Usage: ncbo_ontology_pull [options]
    -o, --ontology ACRONYM           Ontology acronym to pull if new version exist
    -h, --help                       Display this screen
alexskr commented 2 years ago

this would be a very useful script.

alexskr commented 1 year ago

I have noticed that the script runs owlapi when pulling ontology. Is that really required? It's not really a big deal but owlapi wrapper will be run for the 2nd time when ontology gets processed.

syphax-bouazzouni commented 1 year ago

I have noticed that the script runs owlapi when pulling ontology. Is that really required? It's not really a big deal but owlapi wrapper will be run for the 2nd time when ontology gets processed.

Yeah, this behavior was already there before my PR.

They tested if the remote file was parsable (with owlapi) before creating its corresponding submission. It prevents from getting spammed by submissions for staging changes that don't parse.

A possible optimization is to make submissions auto-delete if not parsable and the auto_delete option is set to true e.g sub.process_submission(auto_delete: true) this will call the owlapi only once and delete the submission if not parsable.

I can do that if wanted, but it is beyond this PR, I think.