usegalaxy-eu / workflow-testing

Automated testing of workflows against Galaxy
10 stars 20 forks source link

Strategy to update workflows needed #3

Open bgruening opened 6 years ago

bgruening commented 6 years ago

External workflows, e.g. the once from the training material, should be updated regularly. Or even better should be a linked against the training material. Maybe we can pull them down before running the tests as we do with data sets currently.

cat-bro commented 4 years ago

Hi @bgruening, continuing discussion from https://github.com/galaxyproject/training-material/pull/1896

A way forward could be to rewrite the update_workflows function in update-training.sh so that it updates each workflow within training/* (possibly with wget https://github.com/galaxyproject/training-material/blob/master/<topic_name>/tutorials/<tutorial_name>/workflows/<workflow_name> for each individual file rather than cloning training-material). It would copy the workflows only, not the tests. This could be run by jenkins prior to running the workflow tests, with any changes committed into workflow-testing.

Prior to being able to implement this there would need to be a clean-up of the paths (in workflow-testing) so that any training/<topic_name>/<tutorial_name>/<workflow_name> is consistent with a corresponding topics/<topic_name>/tutorials/<tutorial_name>/workflows/<workflow_name> path in training-material. (for example training/statistics/machine_learning/classification/linear_SVC_classification.ga would become training/statistics/classification_regression/classification_LSVC.ga)

I could make a start on this.

bgruening commented 4 years ago

I like that, this would also save the long cloning step.

Thanks @cat-bro for working on that.

hexylena commented 4 years ago

for what it's worth: My future plan was to just merge this back into the training material. I only created the separate repo to let things move a bit faster while we were getting testing worked out.

Things seem stable enough now (and with a second site!) that we should make this a standard part of the GTN. I'm happy to help with this too, for what it's worth. It would not be too difficult to merge back + update the tests.

bgruening commented 4 years ago

We would still need this repo for other - non training workflows I think.

hexylena commented 4 years ago

Ahh I hadn't considered that use case. Would it make sense to keep both repos then? I would really love to see the training tests as close to the training materials as possible (and kept to high standards), and then this repository could give more freedom, whatever test you want to write, or so?

cat-bro commented 4 years ago

What if everything under training were kept in galaxyproject/training-material (single source of truth) and in this repo there was just a yaml listing the .ga files and test files for any workflows that had tests? Jenkins could get these with wget rather than cloning the repo. The thing I suggested in June about syncing them over would work, but even then the double-ups are problematic. Non-GTN workflows would still live here.

hexylena commented 4 years ago

That could work @cat-bro, sounds good.

bgruening commented 4 years ago

I assume then developing the workflow tests is not that intuitive. As you don't have the workflows next to the tests. What about adding the test files to the workflows in GTN?

hexylena commented 4 years ago

What about adding the test files to the workflows in GTN?

Oh, I assumed that's what @cat-bro meant, both WFs + tests live in the GTN.

cat-bro commented 4 years ago

yes, what @hexylena said

bgruening commented 4 years ago

Cool, Having everything in GTN sounds great!

bgruening commented 3 years ago

So we need some logic to traverse GTN, find workflow-tests and run them.

What do you think about adding a Makefile that clones the GTN (only latest HEAD) and traverses the tree and somehow returns the path to the workflow-test files.

We could also keep the list of workflows (outside of GTN) in this Makefile and finally trigger a run. This would also simplify the Jenkins setup.

The current list from Jenkins is:

training/transcriptomics/rna-seq-viz-with-volcanoplot/rna-seq-viz-with-volcanoplot.ga
training/transcriptomics/rna-seq-viz-with-heatmap2/rna-seq-viz-with-heatmap2.ga
raceid3/raceid3_workflow.ga
example1/wf3-shed-tools.ga
example2/wf4-shed-tools.ga
GraphClust2/GC-lite.ga
training/transcriptomics/small_ncrna_clustering/blockclust_workflow.ga
sklearn/adaboost/adaboost.ga
sklearn/ard/ard.ga
training/variant-analysis/microbial-variants/microbial_variant_calling.ga
training/variant-analysis/dip/diploid.ga
training/variant-analysis/mapping-by-sequencing/mapping_by_sequencing.ga
training/proteomics/protein-id-sg-ps/protein-id-sg-ps.ga
training/proteomics/protein_quant_sil/protein_quant_sil.ga
training/proteomics/metaproteomics/metaproteomics.ga
training/sequence-analysis/ref-based-rad-seq/rad_seq_ref_based.ga
training/sequence-analysis/quality-control/quality_control.ga
training/sequence-analysis/mapping/mapping.ga
training/chip-seq/formation_of_super-structures_on_xi/formation_of_super_structures_on_xi.ga
training/epigenetics/methylation-seq/methylation-seq.ga
training/assembly/general-introduction/assembly-general-introduction.ga
training/assembly/unicycler-assembly/unicycler.ga
training/metagenomics/general-tutorial/amplicon.ga
training/transcriptomics/ref-based/ref_based.ga
training/transcriptomics/small_ncrna_clustering/blockclust_workflow.ga
training/statistics/classification_regression/regression_GradientBoosting.ga
training/statistics/classification_regression/classification_LSVC.ga
training/proteomics/F1000_Metaproteomics_QueryTabular/F1000_Metaproteomics_QueryTabular.ga
training/proteomics/F1000_Proteogenomics_QueryTabular/F1000_Proteogenomics_QueryTabular.ga
training/computational-chemistry/bio3danalysis/MD_Analysis_using_Bio3D.ga
training/computational-chemistry/bio3danalysis/gromacs.ga
training/metabolomics/F1000_Metabolomics_Query_Tabular_Mass_Adjustment.ga
training/statistics/machinelearning/machine_learning.ga
cat-bro commented 3 years ago

Yes, that sounds good.

Something like

git clone https://github.com/galaxyproject/training-material.git
for f in $(find 'training-material' -path '*-test.yml' ); do echo "${f/-test.yml/.ga}" >> list_of_workflows.txt; done
<script that runs the tests>
rm -rf training-material

Some of the tutorials listed above will not have equivalents in GTN (sklearn I think?) so there would need to be a separate list of these

hexylena commented 3 years ago

unnecessary, small optimisation:

find 'training-material' -path '*-test.yml' | sed 's/-test.yml/.ga' > list_of_workflows.txt

otherwise sounds good1

bgruening commented 3 years ago

Some of the tutorials listed above will not have equivalents in GTN (sklearn I think?) so there would need to be a separate list of these

Yes, I think there is a need to have still workflows here. E.g. user-worfklows that will not make it into GTN anytime soon. Or a really strange workflow that just utilize a buch of functionallity etc ...

bgruening commented 3 years ago

@cat-bro do we have a plan to move forward? Are you planning to work on this? Can we help somehow? Next release is coming and we could test those workflows against it.

cat-bro commented 3 years ago

Hi @bgruening, this is on my trello but I keep getting distracted by other things.

I would really love to be able to test all of the workflows that run on Galaxy Australia. It would be a great comfort to people who run training sessions. A short term goal would be to have tests for as many of the workflows in GTN as possible.

Is it OK for me to update some of the GTN tutorials to use more up-to-date versions of the tools than they currently do? I think there are some that still use tool versions that break on python3, but there are also some that use tool versions from 2017/2018 that I don't believe anybody actually teaching or learning from the tutorial will be using: in generally they will be using the latest version available.

A long term goal in my mind would be to have a strategy of workflow testing that accounts for the fact that most of the time, tool versions used in tutorials will be the latest available versions and not necessarily the versions listed on the workflows. I think that the tests as they currently are are useful, but that sometimes they are testing flows that would be unlikely in a teaching setting. For example if Galaxy has grappa version 1.2a and grappa version 3.4b, the workflow might contain grappa 1.2a but in a real world teaching scenario students will be using 3.4b, if this makes any sense.

bgruening commented 3 years ago

Is it OK for me to update some of the GTN tutorials to use more up-to-date versions of the tools than they currently do?

Of course!

A long term goal in my mind would be to have a strategy of workflow testing that accounts for the fact that most of the time, tool versions used in tutorials will be the latest available versions and not necessarily the versions listed on the workflows. I think that the tests as they currently are are useful, but that sometimes they are testing flows that would be unlikely in a teaching setting. For example if Galaxy has grappa version 1.2a and grappa version 3.4b, the workflow might contain grappa 1.2a but in a real world teaching scenario students will be using 3.4b, if this makes any sense.

In theory, our training materials should all be updated to latest versions. We are just not able to do this currently. I guess one step in this direction is to inform the training author ... so have something automatic that bumps the workflow to the latest versions, runs the tests and informs the author of the training to check the (update) PR.

cat-bro commented 3 years ago
cat-bro commented 3 years ago

^ boxes are ticked if the tutorial has an equivalent working test in GTN. I think that the sequence-analysis tutorials do too but I'd need to run the tests again to be sure.

As a first step it would be good to have equivalent tests in GTN for all of these, so that no value is lost moving to running the GTN tests instead of the tests in this repo.

bgruening commented 3 years ago

@cat-bro; @malloryfreeberg has started to add several new tests to GTN in the last weeks. This is all super exciting! Thanks all!

malloryfreeberg commented 3 years ago

@cat-bro; @malloryfreeberg has started to add several new tests to GTN in the last weeks. This is all super exciting! Thanks all!

@cat-bro the list of workflows and workflow tests I've been adding to the GTN material are all referenced in this ticket: https://github.com/galaxyproject/training-material/issues/1459

cat-bro commented 3 years ago

That's great! There are now more tests for training material in GTN than in this repo. There are still a few in this repo that do not have equivalent tests in GTN but they may need a bit more work and can be added to GTN over time. It's probably time to abandon the training tests in this repo and run the tests that are in GTN.

A script to run the tests could be something like

# get list of local workflows with tests (ignoring training folder)
find . \( -name '*-test.yml' ! -path './training*' \) | sed 's/^\.\///g' | sed 's/-test.yml/.ga/g' > $workflow_list

# clone training-material repo
git clone https://github.com/galaxyproject/training-material.git

# get list of training-material workflows with tests
find 'training-material' -path '*-test.yml' | sed 's/-test.yml/.ga/g' >> $workflow_list

mkdir results

cat $workflow_list | while read workflow_path; do
   ./run_galaxy_workflow_tests.sh $workflow_path
   cp tool_test_output.json results/$(sed 's/\//_/g' <<< $workflow_path).tool_test_output.json
done

# planemo merge_test_reports ...... ## merge all of the files in reports and produce on html doc

The above is untested/unfinished.

Not sure if it's best to be cloning training-material each time or to have a clone somewhere on the Jenkins server that can have git pull run on it each time.

I really like the idea of having a merged test report that could be available on Jenkins.

malloryfreeberg commented 3 years ago

@cat-bro I like your ideas, and am fully supportive of harmonising the tests and finding ways to keep them up-to-date. Let me know what would be helpful for you to support this. I'm happy to continue going through all the GTN materials and making sure all the tutorials have both a workflow and a workflow test.

bgruening commented 3 years ago

Not sure if it's best to be cloning training-material each time or to have a clone somewhere on the Jenkins server that can have git pull run on it each time.

Cloning is ok. We can use git clone --depth 1 to speed it up. @cat-bro can we have a Makefile target that runs your script and the cloneing? This would make testing locally easy but also the Jenkins integration.

This is what we do in the tools-land to merge the planemo Json reports: https://github.com/galaxyproject/tools-iuc/blob/master/.github/workflows/pr.yaml#L315