Closed alxndrdiaz closed 1 year ago
The subworflow stops at the BUSCO step:
busco \
--cpu 2 \
--in "$INPUT_SEQS" \
--out GCA_922984935.2.subset.unmasked-[carnivora_odb10, laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10]-busco \
--lineage_dataset [carnivora_odb10, laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10] \
\
\
--mode genome
Command error:
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]
busco: error: unrecognized arguments: laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10]-busco laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10]
From the error description it seems that the method used to convert the text file input to a list of values is not working as expected see line 57 in the subworkflow script.
Had to restore the branch using git-repair then push a single commit that includes all changes (git interruption due to laptop power off), so these are the two commits associated to the recovery that were pushed:
fbef6f - alxndrdiaz, 3 minutes ago : fixed conflicts
0e1b1df - alxndrdiaz, 15 minutes ago : restored busco_subworkflow branch
Old single commits associated to requested changes:
f391868 - alxndrdiaz, 53 seconds ago : added BUSCO lineages path
1813171 - alxndrdiaz, 4 minutes ago : updated BUSCO input uusing single channel
809a1e2 - alxndrdiaz, 22 minutes ago : removed goat_taxosearch parameterd from subworkflow input
f17ba13 - alxndrdiaz, 26 minutes ago : fasta updated to tuple containing [meta, fasta]
9252c48 - alxndrdiaz, 32 minutes ago : added input tuple forBUSCO_DIAMOND which includes meta
af0fffa - alxndrdiaz, 38 minutes ago : restored meta to generate prefix
afb0cd4 - alxndrdiaz, 81 minutes ago : restored tuple input
4f9d553 - alxndrdiaz, 2 hours ago : removed SAMTOOLS_VIEW module
a70da26 - alxndrdiaz, 2 hours ago : removed samtools_view.nf module
9530327 - alxndrdiaz, 2 hours ago : moved module parameters from test.config to modules.config
The subworflow stops at the BUSCO step:
busco \ --cpu 2 \ --in "$INPUT_SEQS" \ --out GCA_922984935.2.subset.unmasked-[carnivora_odb10, laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10]-busco \ --lineage_dataset [carnivora_odb10, laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10] \ \ \ --mode genome Command error: usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS] busco: error: unrecognized arguments: laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10]-busco laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10]
From the error description it seems that the method used to convert the text file input to a list of values is not working as expected see line 57 in the subworkflow script.
Have you tried fromList
or something similar. Not sure if it already solved or not.
The subworflow stops at the BUSCO step:
busco \ --cpu 2 \ --in "$INPUT_SEQS" \ --out GCA_922984935.2.subset.unmasked-[carnivora_odb10, laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10]-busco \ --lineage_dataset [carnivora_odb10, laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10] \ \ \ --mode genome Command error: usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS] busco: error: unrecognized arguments: laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10]-busco laurasiatheria_odb10, eutheria_odb10, mammalia_odb10, tetrapoda_odb10, vertebrata_odb10, metazoa_odb10, eukaryota_odb10, bacteria_odb10, archaea_odb10]
From the error description it seems that the method used to convert the text file input to a list of values is not working as expected see line 57 in the subworkflow script.
Have you tried
fromList
or something similar. Not sure if it already solved or not.
I tried it but got another error. As far as I understand I should pass a list of lineages in this case. So, still unsolved.
I managed to Busco work. I think the problem was that each
, which is used in BUSCO to unroll the list of lineages https://github.com/nf-core/modules/blob/master/modules/busco/main.nf#L12
doesn't work on dynamic lists, cf https://github.com/nextflow-io/nextflow/issues/1566 and the green box at https://www.nextflow.io/docs/latest/process.html#input-repeaters
But also, when given a channel, each
will run each element in it against the other inputs, which is actually not great, because it could potentially match a fasta file against a lineage coming from another fasta file (if the sub-workflow is called multiple times).
My solution, which I have pushed in the commit 178a8f2, is to turn the each
into a regular val
in the busco module, make goat_taxon_search output meta
in the lineages
output channel, and some Nextflow channel manipulation.
I'll raise an issue on nf-core/modules about this each
in the Busco module.
✌🏼
Input_file Dataset Complete Single Duplicated Fragmented Missing n_markers Scaffold N50 Contigs N50 Percent gaps Number of scaffolds
GCA_927399515.1.fa agaricomycetes_odb10 96.4 95.3 1.1 0.5 3.1 2898 2582225 2582225 0.002% 14
GCA_927399515.1.fa eukaryota_odb10 95.7 93.3 2.4 3.1 1.2 255 2582225 2582225 0.002% 14
GCA_927399515.1.fa polyporales_odb10 95.8 94.8 1.0 0.8 3.4 4464 2582225 2582225 0.002% 14
GCA_927399515.1.fa bacteria_odb10 7.3 7.3 0.0 16.9 75.8 124 2582225 2582225 0.002% 14
GCA_927399515.1.fa archaea_odb10 5.7 5.2 0.5 13.4 80.9 194 2582225 2582225 0.002% 14
GCA_927399515.1.fa fungi_odb10 97.7 96.2 1.5 0.3 2.0 758 2582225 2582225 0.002% 14
GCA_927399515.1.fa basidiomycota_odb10 96.4 94.9 1.5 0.7 2.9 1764 2582225 2582225 0.002% 14
✌🏼
Input_file Dataset Complete Single Duplicated Fragmented Missing n_markers Scaffold N50 Contigs N50 Percent gaps Number of scaffolds GCA_927399515.1.fa agaricomycetes_odb10 96.4 95.3 1.1 0.5 3.1 2898 2582225 2582225 0.002% 14 GCA_927399515.1.fa eukaryota_odb10 95.7 93.3 2.4 3.1 1.2 255 2582225 2582225 0.002% 14 GCA_927399515.1.fa polyporales_odb10 95.8 94.8 1.0 0.8 3.4 4464 2582225 2582225 0.002% 14 GCA_927399515.1.fa bacteria_odb10 7.3 7.3 0.0 16.9 75.8 124 2582225 2582225 0.002% 14 GCA_927399515.1.fa archaea_odb10 5.7 5.2 0.5 13.4 80.9 194 2582225 2582225 0.002% 14 GCA_927399515.1.fa fungi_odb10 97.7 96.2 1.5 0.3 2.0 758 2582225 2582225 0.002% 14 GCA_927399515.1.fa basidiomycota_odb10 96.4 94.9 1.5 0.7 2.9 1764 2582225 2582225 0.002% 14
I tried to reproduce the results with another dataset
(mMelMel3)
, but it seems it only runs for the first lineage:
GCA_922984935.2.subset.unmasked-carnivora_odb10-busco/
GCA_922984935.2.subset.unmasked-carnivora_odb10-busco.batch_summary.txt
So, I will try again. I did not change anything except for removing samtools/view module
and also updated lineages_path to a channel:
// Channel of busco ch_lineages
ch_lineages_path = Channel.fromPath(params.busco_lineages_path)
This is the command that I run
nextflow run src/blobtoolkit -profile singularity --fasta ~mm49/nfs/scratch123/nextflow/GCA_927399515.1.fa --outdir btk_out --busco_lineages_path /lustre/scratch123/tol/resources/busco/v5/ --taxon 'Laetiporus sulphureus'
As you can see, I'm using a Fasta file directly. This is because I've locally removed the input_check subworkflow (I only wanted to test BUSCO). You can see my changes at https://github.com/sanger-tol/blobtoolkit/commit/52196f103411d1597983857e56ff75e1ea80fbd8 . You can check this branch out, it's exactly what I used (and it worked). Note: this commit is based off https://github.com/sanger-tol/blobtoolkit/pull/28/commits/afd328b6082db6e4639321f7b9a73cb55a5bda19 and doesn't include https://github.com/sanger-tol/blobtoolkit/pull/28/commits/905139b7f0beff99cd4df8002ab1ae05c3e246eb
This is the command that I run
nextflow run src/blobtoolkit -profile singularity --fasta ~mm49/nfs/scratch123/nextflow/GCA_927399515.1.fa --outdir btk_out --busco_lineages_path /lustre/scratch123/tol/resources/busco/v5/ --taxon 'Laetiporus sulphureus'
As you can see, I'm using a Fasta file directly. This is because I've locally removed the input_check subworkflow (I only wanted to test BUSCO). You can see my changes at 52196f1 . You can check this branch out, it's exactly what I used (and it worked). Note: this commit is based off afd328b and doesn't include 905139b
I think the issue was that I was using an incorrect busco_lineages_path = './'
, I will try again using:
busco_lineages_path = '/lustre/scratch123/tol/resources/busco/v5/'
Besides making this subworkflow work, there are two outstanding tasks:
$input
parameter (taxon names may contain whitespace)The BUSCO module has been fixed !
I am a bit concerned about the current commit history. Seems like the rebase did not happen correctly. @zb32 is working on it again.
I am a bit concerned about the current commit history. Seems like the rebase did not happen correctly. @zb32 is working on it again.
I was not sure about to push the most recent changes, but I created a local branch and will include the changes that I have made once the rebase is fixed.
I am a bit concerned about the current commit history. Seems like the rebase did not happen correctly. @zb32 is working on it again.
Steps for busco_ar34 branch rebase, following git-rebase documentation.
git checkout -b busco_rebase
git pull
git branch --set-upstream-to=origin/busco_rebase busco_rebase
git checkout --track origin/fix_dev
git checkout busco_rebase
git rebase fix_dev busco_rebase
error: Failed to merge in the changes.
Patch failed at 0001 restored busco_subworkflow branch
hint: Use 'git am --show-current-patch' to see the failed patch
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
git am --show-current-patch
commit 0e1b1df030e6849d95325ba186233b2083a722b5
Author: alxndrdiaz <ra.ramos.diaz@gmail.com>
Date: Thu Sep 8 15:34:56 2022 -0600
restored busco_subworkflow branch
.....
git status
rebase in progress; onto da56a84
You are currently rebasing branch 'busco_rebase' on 'da56a84'.
(fix conflicts and then run "git rebase --continue")
(use "git rebase --skip" to skip this patch)
(use "git rebase --abort" to check out the original branch)
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: modules/local/goat_taxon_search.nf
new file: modules/nf-core/modules/busco/main.nf
new file: modules/nf-core/modules/busco/meta.yml
new file: modules/nf-core/modules/custom/dumpsoftwareversions/main.nf
new file: modules/nf-core/modules/custom/dumpsoftwareversions/meta.yml
new file: modules/nf-core/modules/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py
new file: modules/nf-core/modules/multiqc/main.nf
new file: modules/nf-core/modules/multiqc/meta.yml
new file: modules/nf-core/modules/samtools/view/main.nf
new file: modules/nf-core/modules/samtools/view/meta.yml
new file: subworkflows/local/busco_diamond_blastp.nf
Unmerged paths:
(use "git restore --staged <file>..." to unstage)
(use "git add <file>..." to mark resolution)
both added: .github/PULL_REQUEST_TEMPLATE.md
both added: .github/workflows/linting.yml
both added: .prettierignore
both added: README.md
both added: assets/email_template.txt
both added: assets/samplesheet.csv
both added: assets/schema_input.json
both added: bin/tol_input.sh
both added: conf/modules.config
both added: conf/test.config
both added: docs/usage.md
both added: lib/NfcoreTemplate.groovy
both added: modules.json
both added: nextflow_schema.json
both added: subworkflows/local/input_check.nf
both added: workflows/blobtoolkit.nf
Then used git rebase --continue
to go through each failing patch and fix conflict files, until git status output is:
On branch busco_rebase
Your branch and 'origin/busco_rebase' have diverged,
and have 122 and 116 different commits each, respectively.
(use "git pull" to merge the remote branch into yours)
nothing to commit, working tree clean
Then used git pull
, but found more conflicts:
On branch busco_rebase
Your branch and 'origin/busco_rebase' have diverged,
and have 122 and 116 different commits each, respectively.
(use "git pull" to merge the remote branch into yours)
You have unmerged paths.
(fix conflicts and run "git commit")
(use "git merge --abort" to abort the merge)
Unmerged paths:
(use "git add <file>..." to mark resolution)
both modified: conf/test.config
added by us: modules/nf-core/samtools/view/main.nf
added by us: modules/nf-core/samtools/view/meta.yml
both added: subworkflows/local/busco_diamond_blastp.nf
no changes added to commit (use "git add" and/or "git commit -a")
Fixed all conflicts then used git push
to update the changes in the busco_rebase remote branch:
Enumerating objects: 464, done.
Counting objects: 100% (464/464), done.
Delta compression using up to 4 threads
Compressing objects: 100% (382/382), done.
Writing objects: 100% (437/437), 45.75 KiB | 1.48 MiB/s, done.
Total 437 (delta 204), reused 1 (delta 0)
remote: Resolving deltas: 100% (204/204), completed with 10 local objects.
To https://github.com/sanger-tol/blobtoolkit.git
ccccdd6..de9facd busco_rebase -> busco_rebase
At the first glance, the rebase looks mostly OK. But why did you then merge busco_ar34
in it ? The whole point is to have a single, clean, branch, with all the commits and no merge in it.
But more importantly, the busco_rebase
branch doesn't pass the nextflow run
CI. Even locally, the test profile doesn't start with its default parameters. Can you fix it ?
Note: you rebased on top of fix_dev
, which doesn't pass the linting CI. I could perhaps fix the latter and merge #35 first ? Then you'd have to rebase on top of the new dev
.
At the first glance, the rebase looks mostly OK. But why did you then merge
busco_ar34
in it ? The whole point is to have a single, clean, branch, with all the commits and no merge in it.But more importantly, the
busco_rebase
branch doesn't pass thenextflow run
CI. Even locally, the test profile doesn't start with its default parameters. Can you fix it ?Note: you rebased on top of
fix_dev
, which doesn't pass the linting CI. I could perhaps fix the latter and merge #35 first ? Then you'd have to rebase on top of the newdev
.
busco_ar34
. fix_dev
then try to rebase or merge busco_ar34
. I will work on it, if you have any other suggestions please let me know. Like Priyanka said, ignore Zaynab's for now, and just rebase your own changes onto dev
.
The key is that nextflow run . -profile test,singularity
without any other command-line arguments must still run without any problem on your branch. That'll be the first condition for merging the new branch.
Like Priyanka said, ignore Zaynab's for now, and just rebase your own changes onto
dev
. The key is thatnextflow run . -profile test,singularity
without any other command-line arguments must still run without any problem on your branch. That'll be the first condition for merging the new branch.
I will close this PR and will open a new one with the rebased branch.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).This PR is part of the Google Summer of Code 2022 project: Conversion of the BlobToolKit pipeline to Nextflow, GitHub repository: https://github.com/sanger-tol/blobtoolkit