phenoscape / pipeline

Build pipeline for the Phenoscape Knowledgebase
MIT License
0 stars 0 forks source link

Increase parameterization in Makefile #213

Closed balhoff closed 2 years ago

balhoff commented 2 years ago

A place to start might be the DOS-DP patterns.

johnbradley commented 2 years ago

I have a branch that simplifies the dosdp-tools rules for anatomical-entity* files into one pattern rule: https://github.com/phenoscape/pipeline/compare/simplify-dosdp-tools-rules Part of this change renames some intermediate files to match the input files. For example anatomical-entity-presences.ofn is renamed to anatomical-entity-implies_presence_of.ofn. This is to match the input file patterns/implies_presence_of.yaml.

I still need to run the pipeline on these changes so I haven't created a PR yet.

johnbradley commented 2 years ago

Another area of duplication in the Makefile is some taxa and gene files.

We run the follow commands only replacing "taxa" with "genes":

Details

kb-owl-tools pairwise-sim

https://github.com/phenoscape/pipeline/blob/b7a129f1587785bf7f993a2980e3a1f62b2afe62/Makefile#L584-L586

https://github.com/phenoscape/pipeline/blob/b7a129f1587785bf7f993a2980e3a1f62b2afe62/Makefile#L594-L596

kb-owl-tools expects-to-triples

https://github.com/phenoscape/pipeline/blob/b7a129f1587785bf7f993a2980e3a1f62b2afe62/Makefile#L604-L606

https://github.com/phenoscape/pipeline/blob/b7a129f1587785bf7f993a2980e3a1f62b2afe62/Makefile#L619-L621

python $(REGRESSION)

https://github.com/phenoscape/pipeline/blob/b7a129f1587785bf7f993a2980e3a1f62b2afe62/Makefile#L609-L611

https://github.com/phenoscape/pipeline/blob/b7a129f1587785bf7f993a2980e3a1f62b2afe62/Makefile#L623-L625

kb-owl-tools output-ics

https://github.com/phenoscape/pipeline/blob/b7a129f1587785bf7f993a2980e3a1f62b2afe62/Makefile#L640-L642 https://github.com/phenoscape/pipeline/blob/b7a129f1587785bf7f993a2980e3a1f62b2afe62/Makefile#L644-L646

Suggested Change

Rename the "gene-" files to "genes-". (eg. rename gene-pairwise-sim.ttl to genes-pairwise-sim.ttl) Then create a pattern rules for each step that is used for both the "taxa" and "genes" files.

Thoughts on this potential change @balhoff ?

balhoff commented 2 years ago

I agree with all this. One gotcha is the grep vs. grep -v in the rank-statistics targets. We should split that out to generate taxa-profile-sizes.txt and genes-profile-sizes.txt in new targets from profile-sizes.txt.

johnbradley commented 2 years ago

I have three final changes I would like to make for this issue.

NEXMLS subdirectories file

Right now we have an embedded list of multiple find commands: https://github.com/phenoscape/pipeline/blob/cc4263e96a092418669bf3f99669e01f60873b86/Makefile#L131-L141

To parameterize this I suggest adding a new nexml-subdirs.txt file that would have the following contents:

curation-files/completed-phenex-files
curation-files/fin_limb-incomplete-files
curation-files/Jackson_Dissertation_Files
curation-files/teleost-incomplete-files/Miniature_Monographs
curation-files/teleost-incomplete-files/Miniatures_Matrix_Files
curation-files/teleost-incomplete-files/Dillman_Supermatrix_Files
curation-files/matrix-vs-monograph

The makefile we would read the input file and create find command for all of the directories.

Monarch pattern rule

There are 4 curl commands that download monarch data essentially the same way: https://github.com/phenoscape/pipeline/blob/cc4263e96a092418669bf3f99669e01f60873b86/Makefile#L377-L405

To simplify this I want to make a pattern rule for $(BUILD_DIR)/monarch/%.ttl. This would require storing these output files in a subdirectory.

Reduce duplication of phenoscape-kb-tbox.ofn rule

The following rule lists the input files twice. Once as prerequisites then again as input arguments: https://github.com/phenoscape/pipeline/blob/cc4263e96a092418669bf3f99669e01f60873b86/Makefile#L208-L230 I think the above rule can be simplified by using the all prerequisites $^ automatic variable.

johnbradley commented 2 years ago

It looks like we have a rule to build the monarch file hpoa.ttl, but it's commented out everywhere? https://github.com/phenoscape/pipeline/blob/cc4263e96a092418669bf3f99669e01f60873b86/Makefile#L362-L363 https://github.com/phenoscape/pipeline/blob/cc4263e96a092418669bf3f99669e01f60873b86/Makefile#L374-L375

balhoff commented 2 years ago

Here is a Make example for reducing duplication with robot merge: https://github.com/balhoff/ultimate-ontology-makefile/blob/4b7a6c913d7a1a4feb71866c9e86f41260ecc365/Makefile#L39