Closed priyanka-surana closed 9 months ago
nf-core lint
overall result: Passed :white_check_mark: :warning:Posted for pipeline commit f0a74af
+| ✅ 135 tests passed |+
#| ❔ 22 tests were ignored |#
!| ❗ 1 tests had warnings |!
BUSCO in test
and test_raw
profile runs but it fails for test_full
. Here is the error:
I tried with multiple busco input folder values and settings.
Command error:
Traceback (most recent call last):
File "/usr/local/bin/busco", line 42, in <module>
run_BUSCO.main()
File "/usr/local/lib/python3.7/site-packages/busco/run_BUSCO.py", line 420, in main
busco_run.run()
File "/usr/local/lib/python3.7/site-packages/busco/run_BUSCO.py", line 70, in run
self.load_config()
File "/usr/local/lib/python3.7/site-packages/busco/run_BUSCO.py", line 62, in load_config
self.config_manager.load_busco_config_main()
File "/usr/local/lib/python3.7/site-packages/busco/BuscoLogger.py", line 62, in wrapped_func
self.retval = func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/busco/ConfigManager.py", line 58, in load_busco_config_main
self.config_main.validate()
File "/usr/local/lib/python3.7/site-packages/busco/BuscoConfig.py", line 550, in validate
self._init_downloader()
File "/usr/local/lib/python3.7/site-packages/busco/BuscoConfig.py", line 392, in _init_downloader
self.downloader = BuscoDownloadManager(self)
File "/usr/local/lib/python3.7/site-packages/busco/BuscoDownloadManager.py", line 53, in __init__
self._obtain_versions_file()
File "/usr/local/lib/python3.7/site-packages/busco/BuscoLogger.py", line 62, in wrapped_func
self.retval = func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/busco/BuscoDownloadManager.py", line 75, in _obtain_versions_file
urllib.request.urlretrieve(remote_filepath, local_filepath)
File "/usr/local/lib/python3.7/urllib/request.py", line 257, in urlretrieve
tfp = open(filename, 'wb')
PermissionError: [Errno 13] Permission denied: '/tmp/nxf.mWMrsqoKa2/lineages/file_versions.tsv'
Work dir:
/lustre/scratch123/tol/teams/tolit/users/ps22/pipelines/blobtoolkit/work/a5/0dd84f642656f5b7ffc5655393c25c
@priyanka-surana : I just remembered that the Busco module lost its mandatory --offline
: https://github.com/nf-core/modules/pull/4360
Did you update the module recently ? If so, you need to add --offline
in your conf/modules.config
. It didn't cause any issue in the genome-note pipeline yet because its Busco module hasn't been udpated.
@priyanka-surana : I just remembered that the Busco module lost its mandatory
--offline
: nf-core/modules#4360Did you update the module recently ? If so, you need to add
--offline
in yourconf/modules.config
. It didn't cause any issue in the genome-note pipeline yet because its Busco module hasn't been udpated.
I added --offline
only if the lineages path is provided, like this:
path busco_lineages_path // Recommended: path to busco lineages - downloads if not set
...
def busco_lineage_dir = busco_lineages_path ? "--download_path ${busco_lineages_path} --offline" : ''
This gives a new error:
Command output:
New AUGUSTUS_CONFIG_PATH=/tmp/nxf.5PnCMeZR0Z/tmp.3lEJgFGY09
2023-12-19 15:36:43 INFO: ***** Start a BUSCO v5.5.0 analysis, current time: 12/19/2023 15:36:43 *****
2023-12-19 15:36:43 INFO: Configuring BUSCO with local environment
2023-12-19 15:36:43 INFO: Mode is genome
2023-12-19 15:36:43 INFO: Running in batch mode. 1 input files found in /tmp/nxf.5PnCMeZR0Z/input_seqs
2023-12-19 15:36:43 INFO: Input file is /tmp/nxf.5PnCMeZR0Z/input_seqs/GCA_927399515.1.fasta
Short summaries were not available: No genes were found.
Command error:
2023-12-19 15:36:43 ERROR: Unable to run BUSCO in offline mode. Dataset /tmp/nxf.5PnCMeZR0Z/lineages/lineages/eukaryota_odb10 does not exist.
mv: cannot stat 'GCA_927399515.1-eukaryota_odb10-busco/*/short_summary.*.json': No such file or directory
mv: cannot stat 'GCA_927399515.1-eukaryota_odb10-busco/*/short_summary.*.txt': No such file or directory
Work dir:
/lustre/scratch123/tol/teams/tolit/users/ps22/pipelines/blobtoolkit/work/e2/dd5a4ba76c8b173b9bed14921de01a
Removing the busco_lineages_path
makes the pipeline work just fine. I think the issue must be with that dataset or how we are passing it. This is the version I am testing: /lustre/scratch123/tol/teams/tolit/users/ps22/pipelines/blobtoolkit (branch blast). Thanks :)
The whole BUSCO issue is sorted by removing the backslash /
from the end of the location. The test
profile also has a backslash but that works. This error is still bizarre to me.
// profile test
busco = "/lustre/scratch123/tol/resources/nextflow/busco_2021_06_reduced/"
// profile test full
busco = "/lustre/scratch123/tol/resources/busco/latest"
// profile test raw
busco = "/lustre/scratch123/tol/resources/nextflow/busco/blobtoolkit.GCA_922984935.2.2023-08-03"
But now there is a new error, which should be more straight forward. The first line of one of the DIAMOND database is empty. I guess I need to recreate them. Unless I am missing something.
Database: ./gfLaeSulp1.1.buscoregions (type: Diamond database, sequences: 246, letters: 338784)
Block size = 2000000000
Current RSS: 20.4 MB, Peak RSS: 20.4 MB
Opening the input file... Error: Error detecting input file format. First line seems to be blank.
Error: Error detecting input file format. First line seems to be blank.
But now there is a new error, which should be more straight forward. The first line of one of the DIAMOND database is empty. I guess I need to recreate them. Unless I am missing something.
The gfLaeSulp1.1*dmnd
files are identical to the ones Rich has in https://github.com/blobtoolkit/blobtoolkit/tree/main/databases/uniprot
I would rather think that "input file" refers to the Fasta file. Can you check that the Fasta file has sequences in it ?
I would rather think that "input file" refers to the Fasta file. Can you check that the Fasta file has sequences in it ?
Yes right again. The chunks file is empty. The input fasta is created with this command:
btk pipeline chunk-fasta \
--in GCA_927399515.1.fasta \
--busco full_table.tsv \
--out GCA_927399515.1.chunks.fasta \
--chunk 100000 --overlap 0 --max-chunks 10 --min-length 1000
I am looking more into this now.
Yes right again. The chunks file is empty. The input fasta is created with this command:
Rich is also looking into this. For both of us, local tests also produce blank files which Rich does not think should happen.
Please let me know if any other changes not related to this step are needed. We will work on sorting the errors in parallel to other changes.
The error was:
OK, there were 2 problems. BUSCO 5.5 changed the full_table format a little so the parsing wasn't working and then I found a bug in a check for runs of unmasked bases that was checking for masked instead. I've just pushed the changes so a fixed version 4.3.1 should be building now
It is resolved now, the blast steps have all completed successfully.
WARN: There's no process matching config selector: BLOBTOOLKIT_DEPTH -- Did you mean: BLOBTOOLKIT_CHUNK?
modules/local/blastn.nf
(BLASTN
) seems not used ?The full test is complete successfully. You can view the results here: http://grit-btk.tol-dev.sanger.ac.uk/view/btk_prod_test_full_mosdepth/dataset/btk_prod_test_full_mosdepth/blob
I am only concerned about this part in the results (meta.json):
"settings": {
"blast_chunk":100000,
"blast_max_chunks":10,
"blast_min_length":1000,
"blast_overlap":0,
"pipeline":"https://github.com/blobtoolkit/blobtoolkit","release":"blobtoolkit-pipeline v4.1.6",
"software_versions": {
"blastn":"2.12.0+",
"blobtk":"0.2.4",
"blobtools":"4.3.1",
"busco":"5.3.2",
"diamond":"2.0.15",
"minimap2":"2.24-r1122",
"python":"3.9.13",
"samtools":"1.15.1",
"seqtk":"1.3-r106",
"snakemake":"7.19.1"},
"stats_chunk":1000,
"stats_windows":[0.1,0.01,100000,1000000],
"taxdump":"./taxdump","tmp":"/tmp"},
"similarity": {
"diamond_blastp":{
"evalue":1e-10,
"import_evalue":1e25,
"import_max_target_seqs":100000,
"max_target_seqs":10,
"name":"reference_proteomes",
"path":"./uniprot",
"taxrule":"blastp=buscogenes"},
"diamond_blastx": {
"evalue":1e-10,
"import_evalue":1e-25,
"max_target_seqs":10,
"name":"reference_proteomes",
"path":"./uniprot",
"taxrule":"buscogenes"}}
Is this going to cause issues for production?
@gq1 Can you please take a look at the tests for this pipeline? Thanks.
@gq1 Can you please take a look at the tests for this pipeline? Thanks.
Can you run the test locally?
Not much help from the log, run out of disk space?
2023-12-20T14:34:17.7624742Z touch: cannot touch '.command.trace': Permission denied
2023-12-20T14:34:17.7625306Z
2023-12-20T14:34:17.7625598Z Work dir:
2023-12-20T14:34:17.7626480Z /home/runner/work/blobtoolkit/blobtoolkit/work/d9/ce32b60e9dba5d624bc89e3bcf10c8
I looked at /lustre/scratch123/tol/teams/tolit/users/ps22/pipelines/blobtoolkit/results/GCA_927399515.1
and browsed the data on the BTK viewer.
These seem fine:
GCA*.summary.json
(incl. the BUSCO score), identifiers.json
, meta.json
These seem weird:
${lineage}_odb10_count*.json
only have 0s for the related eukaryote lineages (fungi and below) and non-0s for bacteria/archaea ! Maybe there's some filtering happening.buscogenes*
and buscoregions*
JSON files, so can't really tell whether the counts are correctblack
) is failingTo keep the code consistent with lots of contributors, we run automated code consistency checks. To fix this CI test, please run:
black
: pip install black
black .
Once you push these changes the test should pass, and you can hide this comment :+1:
We highly recommend setting up Black in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!
Thanks again for your contribution!
I looked at
/lustre/scratch123/tol/teams/tolit/users/ps22/pipelines/blobtoolkit/results/GCA_927399515.1
and browsed the data on the BTK viewer.These seem fine:
- GC%
- Read coverage
- Lengths
- Gaps (Ns)
- "position" and "proportion" files
- Summary files
GCA*.summary.json
(incl. the BUSCO score),identifiers.json
,meta.json
These seem weird:
${lineage}_odb10_count*.json
only have 0s for the related eukaryote lineages (fungi and below) and non-0s for bacteria/archaea ! Maybe there's some filtering happening.- I also don't know / understand what's supposed to be in the
buscogenes*
andbuscoregions*
JSON files, so can't really tell whether the counts are correct
This is fixed.
This is a bit long and messy. I am running a full test here:
/lustre/scratch123/tol/teams/tolit/users/ps22/pipelines/blobtoolkit
. Will update once it completes, but please start reviewing if you can. Would be great to get it merged this week. Thanks :)