Open souzadevinicius opened 9 months ago
One week, oof! I have previously been able to complete a very similar PHENIO HP vs MP on a GCloud instance with < 64 GB memory, though it did consume a lot of that resource. What kind of resource usage do you see with higher thresholds, like a min of 10 for AIC and 0.4 for Jaccard? The labeling can also consume a surprisingly large amount of resources and is very redundant for this sort of comparison, so I'd suggest dropping that parameter and mapping CURIEs to labels after the comparison is complete.
@souzadevinicius lets try a 0.4 Jaccard threshold and removing the labelling options and see if that makes it at least possible to run HP-ZP
Ok
@souzadevinicius lets try a 0.4 Jaccard threshold and removing the labelling options and see if that makes it at least possible to run HP-ZP
what's the status of this?
Discussing in the MWF hackathon now
We were thinking we would deploy semsimian/oak on our build server and run on a regular cadence. This way we have an objective measure of how much memory/time we are talking about here, and we can also emit a new artifact with a PURL so people can use this downstream.
@caufieldjh perhaps we already have a repo to do this?
Ah okay, Harry has already made a repo for this here
Sorry I'm a little late to the party but @souzadevinicius , did you run this without --autolabel
or specify --no-autolabel
? Just to get an idea how fast it'll be.
The last build in Aug '23 took 1h and 18m.
Sorry I'm a little late to the party but @souzadevinicius , did you run this without --autolabel or specify --no-autolabel? Just to get an idea how fast it'll be.
Yep good question @hrshdhgd
Harry says a previous build with auto-label turned on took 15h so this might be at least one thing that is slowing down Vinicius's run
Note that the Jenkins build performed by that repo takes a bit over 1 hr without autolabel and ~15 hrs w/ autolabel.
For reasons not entirely clear to me, this build took 3 hours. Here's the command:
runoak -i semsimian:sqlite:obo:phenio similarity --no-autolabel -p i --set1-file HPO_terms.txt --set2-file MP_terms.txt -O csv -o HP_vs_MP_semsimian.tsv --min-ancestor-information-content 4.0
That's with:
semsimian-0.2.11
oaklib-0.5.25
The product: http://kg-hub-public-data.s3.amazonaws.com/monarch/HP_vs_MP_semsimian.tsv.tar.gz
I'm trying to calculate semantic similarity profiles using Phenio ontology comparing different term sets
Ontology used: Phenio Library versions
command line execution example:
I tried to run these experiments locally (32 and 64 GB RAM machines) and in a HPC (writing output process took more than one week and then was killed)