Open gitcruz opened 3 days ago
Hi @gitcruz . Can you try with /scratch_isilon/groups/assembly/data/databases/BUSCO_2024_11/v5/data/
, i.e. without the trailing lineages
?
The way we run the pipeline, --busco
gets the path to a directory that contains lineages
, cf:
├── information
├── lineages
│ ├── acidobacteria_odb10
│ │ ├── hmms
│ │ └── info
│ ├── aconoidasida_odb10
│ │ ├── hmms
│ │ ├── info
│ │ └── prfl
│ (...)
│ ├── viridiplantae_odb10
│ │ ├── hmms
│ │ ├── info
│ │ └── prfl
│ └── xanthomonadales_odb10
│ ├── hmms
│ └── info
└── placement_files
All the refseq_db.faa.gz
have been decompressed already (like you did). I should mention that in the doc.
Matthieu
Thanks for the quick response Matthieu,
I'm trying it that way. So far the nextflow job has been running > 2hours
WRT the databases path is it necessary to add the final slash or not (i.e. --blastn /scratch_isilon/groups/assembly/data/databases/nt_2024_10_03/)?
And also I don't understand the guide examples for the diamond databases I just have one. While you show two: --blastp /path/to/buscogenes.dmnd --blastx /path/to/buscoregions.dmnd
I am using only one: --blastp /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd \ --blastx /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd \
Is this correct? I followed the guide and built just one diamond db...
Regards, Fernando
WRT the databases path is it necessary to add the final slash or not (i.e. --blastn /scratch_isilon/groups/assembly/data/databases/nt_2024_10_03/)?
I think it should work the same with and without.
And also I don't understand the guide examples for the diamond databases I just have one. While you show two: --blastp /path/to/buscogenes.dmnd --blastx /path/to/buscoregions.dmnd
I am using only one: --blastp /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd --blastx /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd \
Is this correct? I followed the guide and built just one diamond db...
Yes it's correct. Those are different parameters in case people want to use different databases. I could imagine someone optimising the pipeline by using a more restricted database for the blastp search (which happens first) in order to get the blastp jobs done quicker, while using the complete database for the blastx search (which happens after). In practice, the way we run it on all our assembled genomes, we use the same, complete, database for both.
Best, Matthieu
Description of the bug
Dear developers,
I downloaded and installed the pipeline v0.6.0.
As pointed out in the usage, I downloaded the entire busco v5 databases, untarred them. As I was having a recurrent error with BUSCO, after that I also decompressed the refseq_db.faa.gz for all dbs. However the error still persists and it looks like this:
_2024-11-14 12:32:41 ERROR: Unable to run BUSCO in offline mode. Dataset /scratch_tmp/32318106/nxf.LMCX5N46Un/lineages/lineages/viridiplantae_odb10 does not exist. mv: cannot stat 'tnRamLact8_Nhpy_mq10-viridiplantae_odb10-busco//short_summary..json': No such file or directory mv: cannot stat 'tnRamLact8_Nhpy_mq10-viridiplantae_odb10-busco//short_summary..txt': No such file or directory_
_Work dir: /scratch_isilon/groups/assembly/data/projects/BGE/tnRamLact/assembly/curation/nextdenovo.hypo1.purged.yahs_mq10/1_blobtoolkit/blobtoolkitnextflow/work/e3/910c9ec4ebbab08e742510c3a50ee8
I don't really know why is not finding the busco databases!!! all of them are stored here: /scratch_isilon/groups/assembly/data/databases/BUSCO_2024_11/v5/data/lineages/
This is my nextflow command:
_nextflow \ run /software/assembly/pipelines/nf-core-pipelines/blobtoolkit_sanger-tol/blobtoolkit-0.6.0/main.nf \ -c /software/assembly/pipelines/nf-core-pipelines/cluster_config/cnag_nextflow_queue.config \ -profile singularity \ --input tnRamLact8_samplesheet_s3.csv \ --outdir out \ --fasta tnRamLact8_Nhpy_mq10.fasta \ --taxon 947578 \ --align true \ --taxdump /scratch_isilon/groups/assembly/data/databases/taxdump_2024_10_01 \ --blastp /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd \ --blastx /scratch_isilon/groups/assembly/data/databases/uniprot_2024_10_03/reference_proteomes.dmnd \ --blastn /scratch_isilon/groups/assembly/data/databases/nt_2024_10_03 \ --busco /scratch_isilon/groups/assembly/data/databases/BUSCO_2024_11/v5/data/lineages/ \ --busco_lineages metazoa_odb10,viridiplantae_odb10,fungi_odb10,apicomplexa_odb10,euglenozoa_odb10,diptera_odb10,alphaproteobacteria_odb10,mycoplasmatales_odb10,proteobacteria_odb10,nematoda_odb10,rickettsialesodb10
I am attaching the full log and sbatch command so you can check it entirely. I would really appreciate if you can help me to overcome this error and get this pipeline running.
Thanks. btk_v0.6.0_nextflow.log run_blobtoolkit_v060_on_tnRamLact8.sbatch.txt
Command used and terminal output
No response
Relevant files
No response
System information
No response