Closed muffato closed 1 year ago
nf-core lint
overall result: Passed :white_check_mark: :warning:Posted for pipeline commit 370361f
+| ✅ 126 tests passed |+
#| ❔ 17 tests were ignored |#
!| ❗ 8 tests had warnings |!
Looks great to me, thanks again for the work you're putting in.
I'll wait for input from @gq1 incase there are any Tower breaking changes, unless you know for sure @muffato ?
I tried on the farm, both profile test
and test_full
are fine.
With profile test_github
, after I download the data and replace the path, I still get the error:
container creation failed: mount /home/runner/work/treeval/treeval/TreeValTinyData/gene_alignment_data/fungi/LaetiporusSulphureus/LaetiporusSulphureus.gfLaeSulp1/cdna->/home/runner/work/treeval/treeval/TreeValTinyData/gene_alignment_data/fungi/LaetiporusSulphureus/LaetiporusSulphureus.gfLaeSulp1/cdna error: while mounting /home/runner/work/treeval/treeval/TreeValTinyData/gene_alignment_data/fungi/LaetiporusSulphureus/LaetiporusSulphureus.gfLaeSulp1/cdna: mount source /home/runner/work/treeval/treeval/TreeValTinyData/gene_alignment_data/fungi/LaetiporusSulphureus/LaetiporusSulphureus.gfLaeSulp1/cdna doesn't exist
I can't see anywhere still has the path for github.
Here is the full yaml file for assets/github_testing/TreeValTinyTest.yaml
assembly:
level: scaffold
sample_id: grTriPseu1
latin_name: to_provide_taxonomic_rank
classT: fungi
asmVersion: 1
dbVersion: "1"
gevalType: DTOL
reference_file: /lustre/scratch123/tol/teams/tolit/users/gq2/git_test/treeval/testdata/TreeValTinyData/assembly/draft/grTriPseu1.fa
assem_reads:
pacbio: /lustre/scratch123/tol/teams/tolit/users/gq2/git_test/treeval/testdata/TreeValTinyData/genomic_data/pacbio/
hic: /lustre/scratch123/tol/teams/tolit/users/gq2/git_test/treeval/testdata/TreeValTinyData/genomic_data/hic-arima/
supplementary: path
alignment:
data_dir: /lustre/scratch123/tol/teams/tolit/users/gq2/git_test/treeval/testdata/TreeValTinyData/gene_alignment_data/
common_name: "" # For future implementation (adding bee, wasp, ant etc)
geneset: "LaetiporusSulphureus.gfLaeSulp1"
#Path should end up looking like "{data_dir}{classT}/{common_name}/csv_data/{geneset}-data.csv"
self_comp:
motif_len: 0
mummer_chunk: 10
synteny:
synteny_genome_path: /lustre/scratch123/tol/teams/tolit/users/gq2/git_test/treeval/testdata/TreeValTinyData/synteny/
outdir: "NEEDS TESTING"
intron:
size: "50k"
telomere:
teloseq: TTAGGG
busco:
lineages_path: /lustre/scratch123/tol/teams/tolit/users/gq2/git_test/treeval/testdata/TreeValTinyData/busco/subset/
lineage: fungi_odb10
Not sure why both test_github
tests failed or skipped here. I re-triggered them.
Not sure why both
test_github
tests failed or skipped here. I re-triggered them.
Thanks, I've restarted them twice now. Both times failing at steps late in the hic_mapping subworkflows. I think, for some reason, that it's hitting resource limits.
I tried on the farm, both profile
test
andtest_full
are fine.With profile
test_github
, after I download the data and replace the path, I still get the error:container creation failed: mount /home/runner/work/treeval/treeval/TreeValTinyData/gene_alignment_data/fungi/LaetiporusSulphureus/LaetiporusSulphureus.gfLaeSulp1/cdna->/home/runner/work/treeval/treeval/TreeValTinyData/gene_alignment_data/fungi/LaetiporusSulphureus/LaetiporusSulphureus.gfLaeSulp1/cdna error: while mounting /home/runner/work/treeval/treeval/TreeValTinyData/gene_alignment_data/fungi/LaetiporusSulphureus/LaetiporusSulphureus.gfLaeSulp1/cdna: mount source /home/runner/work/treeval/treeval/TreeValTinyData/gene_alignment_data/fungi/LaetiporusSulphureus/LaetiporusSulphureus.gfLaeSulp1/cdna doesn't exist
I can't see anywhere still has the path for github.
Here is the full yaml file for assets/github_testing/TreeValTinyTest.yaml
Your yaml is fine, there's another line that has to be changed though. In the TreeValTinyData/gene_alignment_data/fungi/csv_data/LaetiporusSulphureus.gfLaeSulp1-data.csv
There is the true path for the data. That needs to be updated and then it'll run.
Not sure why both
test_github
tests failed or skipped here. I re-triggered them.Thanks, I've restarted them twice now. Both times failing at steps late in the hic_mapping subworkflows. I think, for some reason, that it's hitting resource limits.
But the same test data as before, why we need more resources now?
Can we define this path in the parent yaml file?
It seems too complicated if we add one more step in the usage.
Your yaml is fine, there's another line that has to be changed though. In the
TreeValTinyData/gene_alignment_data/fungi/csv_data/LaetiporusSulphureus.gfLaeSulp1-data.csv
There is the true path for the data. That needs to be updated and then it'll run.
Not sure why both
test_github
tests failed or skipped here. I re-triggered them.Thanks, I've restarted them twice now. Both times failing at steps late in the hic_mapping subworkflows. I think, for some reason, that it's hitting resource limits.
But the same test data as before, why we need more resources now?
I've been watching the job execution, I think it was the order in which some of the jobs were being executed causing it to hit the resource limit.
Can we define this path in the parent yaml file?
It seems too complicated if we add one more step in the usage.
Your yaml is fine, there's another line that has to be changed though. In the
TreeValTinyData/gene_alignment_data/fungi/csv_data/LaetiporusSulphureus.gfLaeSulp1-data.csv
There is the true path for the data. That needs to be updated and then it'll run.
We have added instructions to the usage on how to set it all up and there are some scripts to autogenerate the csv files which should only need to be generated once.
I agree though it is a weak spot and something we could look at changing in V2.
If it's ok with you both, i'd like to merge
I just pushed an update of the test usage docs to mention that the path in TreeValTinyData/gene_alignment_data/fungi/csv_data/LaetiporusSulphureus.gfLaeSulp1-data.csv
needs to be updated too.
In summary, the tests are a bit flaky but with retries and some luck, they can pass ? It's definitely not ideal, but I'm OK with it and merging at this point.
@DLBPointon : if you think the resources are the issue, you could maybe set maxForks
to 1 to force just 1 process to run at a time ? Also, maybe reduce max_memory
on GitHub. The VM has 7 GB, and the main nextflow job will take some RAM, that's why the config says max_memory = 6.GB
but maybe the nextflow itself needs more than 1 GB ?
@muffato has suggested adding maxForks, we have also just spoken about adding queueSize to the config. I've just tested this profile on the farm and works fine. Looking through the resource logs. It doesn't really
---RUN_DATA---
Pipeline_version: v1.0.0
Pipeline_runname: desperate_hawking
Pipeline_session: 9a366331-ccc9-4476-932b-7d59c1d9f3bb
Pipeline_duration: 1391
Pipeline_datastrt: 2023-09-27T09:58:05.122221993+01:00
Pipeline_datecomp: 2023-09-27T10:21:16.957106317+01:00
Pipeline_entrypnt: RAPID
---INPUT_DATA---
InputSampleID: grTriPseu1_1
InputYamlFile: /nfs/treeoflife-01/teams/tola/users/dp24/treeval/assets/github_testing/TreeValTinyTest.yaml
InputAssemblyData: [[id:grTriPseu1_1, sz:33480489, ln:fungi, tk:DTOL], /nfs/treeoflife-01/teams/tola/users/dp24/treeval/TreeValTinyData/assembly/draft/grTriPseu1.fa]
Input_PacBio_Files: [[id:pacbio, sz:144583166], /nfs/treeoflife-01/teams/tola/users/dp24/treeval/work/c9/d3e730c96ea81229526c7a21915464/in/seqkitPacbio50000.fasta.gz]
Input_Cram_Files: [[id:cram, sz:[470158966, 466927250]], [/nfs/treeoflife-01/teams/tola/users/dp24/treeval/work/7c/78e31c6214a50ebbd8393c3d7909bc/in/SUBSET-1000.cram, /nfs/treeoflife-01/teams/tola/users/dp24/treeval/work/7c/78e31c6214a50ebbd8393c3d7909bc/in/SUBSET-2000.cram]]
---RESOURCES---
name status module cpus memory attempt realtime %cpu %mem peak_rss
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:GrabFiles (grTriPseu1_1) COMPLETED - 1 6 GB 1 34ms 100.0% 0.0% 0
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:GrabFiles (grTriPseu1_1) COMPLETED - 1 6 GB 1 34ms 54.3% 0.0% 0
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:GAP_FINDER:SEQTK_CUTN (grTriPseu1_1) COMPLETED - 2 6 GB 1 0ms 42.5% 0.0% 5.2 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:TELO_FINDER:FIND_TELOMERE_REGIONS (grTriPseu1_1) COMPLETED - 2 6 GB 1 431ms 71.3% 0.0% 20 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:GAP_FINDER:GAP_LENGTH (grTriPseu1_1) COMPLETED - 2 6 GB 1 14ms 46.0% 0.0% 0
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:TELO_FINDER:FIND_TELOMERE_WINDOWS (grTriPseu1_1) COMPLETED - 2 6 GB 1 0ms 94.0% 0.0% 2.8 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:TELO_FINDER:EXTRACT_TELO (grTriPseu1_1) COMPLETED - 2 6 GB 1 18ms 82.1% 0.0% 0
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:GENERATE_GENOME:CUSTOM_GETCHROMSIZES (grTriPseu1_1) COMPLETED - 1 6 GB 1 0ms 71.2% 0.0% 3.1 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:MINIMAP2_INDEX (1) COMPLETED - 2 6 GB 1 3s 76.9% 0.1% 264 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:BWAMEM2_INDEX (grTriPseu1.fa) COMPLETED - 1 6 GB 1 20s 84.9% 0.2% 717.4 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:GENERATE_CRAM_CSV (grTriPseu1_1) COMPLETED - 2 6 GB 1 0ms 48.0% 0.0% 3.1 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:WINDOWMASKER_MKCOUNTS (grTriPseu1_1) COMPLETED - 2 6 GB 1 7s 96.8% 0.0% 114.1 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:GENERATE_GENOME:GNU_SORT (grTriPseu1_1) COMPLETED - 2 6 GB 1 23ms 29.9% 0.0% 2.9 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:GENERATE_GENOME:GET_LARGEST_SCAFF (grTriPseu1_1) COMPLETED - 2 6 GB 1 22ms 39.4% 0.0% 0
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:GNU_SORT_B (grTriPseu1_1) COMPLETED - 2 6 GB 1 20ms 39.0% 0.0% 2.8 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:BEDTOOLS_MAKEWINDOWS (grTriPseu1_1) COMPLETED - 1 6 GB 1 0ms 65.3% 0.0% 3.2 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:TELO_FINDER:TABIX_BGZIPTABIX (grTriPseu1_1) COMPLETED - 1 6 GB 1 0ms 37.9% 0.0% 3.1 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:GAP_FINDER:TABIX_BGZIPTABIX (grTriPseu1_1) COMPLETED - 1 6 GB 1 0ms 30.5% 0.0% 3.1 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:GNU_SORT_C (grTriPseu1_1) COMPLETED - 2 6 GB 1 28ms 27.2% 0.0% 2.8 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:WINDOWMASKER_USTAT (grTriPseu1_1) COMPLETED - 2 6 GB 1 7s 97.4% 0.0% 53.1 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:EXTRACT_REPEAT (grTriPseu1_1) COMPLETED - 2 6 GB 1 0ms 68.8% 0.0% 2.7 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:BEDTOOLS_INTERSECT (grTriPseu1_1) COMPLETED - 1 6 GB 1 0ms 64.4% 0.0% 2.9 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:RENAME_IDS (grTriPseu1_1) COMPLETED - 2 6 GB 1 55ms 49.1% 0.0% 2.9 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:GNU_SORT_A (grTriPseu1_1) COMPLETED - 2 6 GB 1 129ms 72.8% 0.0% 2.8 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:REFORMAT_INTERSECT (grTriPseu1_1) COMPLETED - 2 6 GB 1 149ms 81.2% 0.0% 2.9 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:BEDTOOLS_MAP (grTriPseu1_1) COMPLETED - 1 6 GB 1 0ms 63.4% 0.0% 3.1 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:REPLACE_DOTS (grTriPseu1_1) COMPLETED - 2 6 GB 1 19ms 54.7% 0.0% 3 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:REPEAT_DENSITY:UCSC_BEDGRAPHTOBIGWIG (grTriPseu1_1) COMPLETED - 2 6 GB 1 0ms 39.3% 0.0% 2.8 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:MINIMAP2_ALIGN (grTriPseu1_1) COMPLETED - 2 6 GB 1 2m 11s 193.3% 0.2% 694.1 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:SAMTOOLS_MERGE (grTriPseu1) COMPLETED - 2 6 GB 1 5s 111.7% 0.0% 11.6 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:SAMTOOLS_SORT (grTriPseu1) COMPLETED - 2 6 GB 1 6s 148.0% 0.2% 694 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:SAMTOOLS_VIEW (grTriPseu1) COMPLETED - 2 6 GB 1 5s 90.1% 0.0% 7.7 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:BEDTOOLS_BAMTOBED (grTriPseu1) COMPLETED - 2 6 GB 1 3s 96.8% 0.0% 7.4 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:BEDTOOLS_GENOMECOV (grTriPseu1) COMPLETED - 1 6 GB 1 0ms 85.8% 0.0% 3.1 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:GNU_SORT (grTriPseu1) COMPLETED - 2 6 GB 1 102ms 63.9% 0.0% 2.9 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:GRAPHOVERALLCOVERAGE (grTriPseu1) COMPLETED - 1 6 GB 1 0ms 80.7% 0.0% 2.9 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:GETMINMAXPUNCHES (grTriPseu1) COMPLETED - 1 6 GB 1 89ms 44.5% 0.0% 2.8 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:FINDHALFCOVERAGE (grTriPseu1) COMPLETED - 1 6 GB 1 1s 84.3% 0.0% 2.9 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:BEDTOOLS_MERGE_MIN (grTriPseu1) COMPLETED - 1 6 GB 1 0ms 56.3% 0.0% 3 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:LONGREAD_COVERAGE:UCSC_BEDGRAPHTOBIGWIG (grTriPseu1_1) COMPLETED - 2 6 GB 1 0ms 78.3% 0.0% 2.8 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:CRAM_FILTER_ALIGN_BWAMEM2_FIXMATE_SORT (grTriPseu1_1) COMPLETED - 2 6 GB 1 6m 55s 192.0% 0.6% 2.2 GB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:CRAM_FILTER_ALIGN_BWAMEM2_FIXMATE_SORT (grTriPseu1_1) COMPLETED - 2 6 GB 1 6m 37s 191.9% 0.5% 2.3 GB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:SAMTOOLS_MERGE (grTriPseu1_1) COMPLETED - 2 6 GB 1 1m 41s 112.2% 0.0% 13.8 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:SAMTOOLS_MARKDUP (grTriPseu1_1) COMPLETED - 2 6 GB 1 1m 21s 171.6% 0.0% 71.5 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:BAMTOBED_SORT (grTriPseu1_1) COMPLETED - 2 6 GB 1 28.3s 144.7% 0.0% 23.7 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:GET_PAIRED_CONTACT_BED (grTriPseu1_1) COMPLETED - 2 6 GB 1 311ms 100.6% 0.0% 16.1 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:COOLER_CLOAD (grTriPseu1_1) COMPLETED - 2 6 GB 1 4s 88.6% 0.0% 141.7 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:COOLER_ZOOMIFY (grTriPseu1_1) COMPLETED - 2 6 GB 1 5s 94.8% 0.0% 312.5 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:PRETEXTMAP_STANDRD (grTriPseu1_1) COMPLETED - 2 3 GB 1 5m 7s 146.8% 0.5% 1.9 GB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:HIC_MAPPING:SNAPSHOT_SRES (grTriPseu1_1) COMPLETED - 1 6 GB 1 1s 95.2% 0.0% 48.9 MB
RAPID:SANGERTOL_TREEVAL_RAPID:TREEVAL_RAPID:CUSTOM_DUMPSOFTWAREVERSIONS (1) COMPLETED - 1 6 GB 1 226ms 55.7% 0.0% 2.4 MB
As you can see there are only 3 jobs that need more than 1Gb of mem, those that do however are also very cpu heavy so maybe there was a mix of cpu, mem + the nextflow processes that was killing it.
Maybe those two random fails where just where the jobs lined up well enough to kill it?
As I told you on Slack, @DLBPointon, here is a change for reinstating a
test
profile that runs quickly locally. The currenttest
, which only works on GitHub, is now renamedtest_github
.I've also added instructions for people to run that test locally, because all they have to do really is download the data and update the path.
I've added the missing citations you asked me about.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).