stajichlab / AAFTF

Automatic Assembly For The Fungi
MIT License
19 stars 4 forks source link

spades assembly failed #27

Open Gian77 opened 2 months ago

Gian77 commented 2 months ago

​Hello,

I am trying to run the AAFTF pipeline to assemble several Pleurotus genomes (testing it on 1 genome only) and I thought to run it as a pipeline at first. I am getting this error below. Spades seems to fail, but I cannot find any spades .log file anywhere. What do you think? I am running it in HPC that uses SLURM, please see the slurm output and the submitted sbatch file attached.

Additionally, 1) I am not totally sure the difference between these options (see AAFTF piepieline -h option below) : --tmpdir TMPDIR Assembler temporary dir and -w WORKDIR, --workdir WORKDIR temp directory 2) and how to pass parameters to spades using the--assembler_args ASSEMBLER_ARGS Additional SPAdes/Megahit arguments` if it is possible, for example, different kmer sizes etc. Please let me know if you me to put these in a different issue ticket. Thanks much! Gian

benucci@dev-amd20 code]$ conda activate aaftf
(aaftf) [benucci@dev-amd20 code]$ AAFTF pipeline -h
usage: AAFTF pipeline [-h] [-q] [--tmpdir TMPDIR] [--assembler_args ASSEMBLER_ARGS] [--method METHOD] -l LEFT [-r RIGHT] -o BASENAME [-c cpus]
                      [-m MEMORY] [-ml MINLEN] [-a [SCREEN_ACCESSIONS ...]] [-u [SCREEN_URLS ...]] [-it ITERATIONS] [-mc MINCONTIGLEN]
                      [--AAFTF_DB AAFTF_DB] [-w WORKDIR] [-v] -p PHYLUM [PHYLUM ...] [--sourdb SOURDB] [--mincovpct MINCOVPCT]

Run entire AAFTF pipeline automagically

options:
  -h, --help            show this help message and exit
  -q, --quiet           Do not output warnings to stderr
  --tmpdir TMPDIR       Assembler temporary dir
  --assembler_args ASSEMBLER_ARGS
                        Additional SPAdes/Megahit arguments
  --method METHOD       Assembly method: spades, dipspades, megahit
  -l LEFT, --left LEFT  left/forward reads of paired-end FASTQ or single-end FASTQ.
  -r RIGHT, --right RIGHT
                        right/reverse reads of paired-end FASTQ.
  -o BASENAME, --out BASENAME
                        Output basename, default to base name of --left reads
  -c cpus, --cpus cpus  Number of CPUs/threads to use.
  -m MEMORY, --memory MEMORY
                        Memory (in GB) setting for SPAdes. Default is Auto
  -ml MINLEN, --minlen MINLEN
                        Minimum read length after trimming, default: 75
  -a [SCREEN_ACCESSIONS ...], --screen_accessions [SCREEN_ACCESSIONS ...]
                        Genbank accession number(s) to screen out from initial reads.
  -u [SCREEN_URLS ...], --screen_urls [SCREEN_URLS ...]
                        URLs to download and screen out initial reads.
  -it ITERATIONS, --iterations ITERATIONS
                        Number of Pilon Polishing iterations to run
  -mc MINCONTIGLEN, --mincontiglen MINCONTIGLEN
                        Minimum length of contigs to keep
  --AAFTF_DB AAFTF_DB   Path to AAFTF resources, defaults to $AAFTF_DB
  -w WORKDIR, --workdir WORKDIR
                        temp directory
  -v, --debug           Provide debugging messages
  -p PHYLUM [PHYLUM ...], --phylum PHYLUM [PHYLUM ...]
                        Phylum or Phyla to keep matches, i.e. Ascomycota
  --sourdb SOURDB       SourMash LCA k-31 taxonomy database
  --mincovpct MINCOVPCT
                        Minimum percent of N50 coverage to remove

aaftf_piperun.zip

hyphaltip commented 2 months ago

workdir should be where the trimmed read files go while tempdir is where the spades temporary files are written during assembly

the error message from spades is: "== Warning == output dir is not empty! Please, clean output directory before run."

so maybe you need to make sure the output directory is not still there? check on? $project_dir/outputs/test_genome

Gian77 commented 2 months ago

Hello Jason,

Thank you for the email. I still get the same error after following your suggestions.

This is the error

== Warning ==  output dir is not empty! Please, clean output directory before run.

SPAdes genome assembler v3.15.5

Usage: spades.py [options] -o <output_dir>
spades.py: error: Please specify option (e.g. -1, -2, -s, etc)) for the following paths: --restart-from last

This is how I included the output directories

    AAFTF pipeline \
        ... 
    --tmpdir /mnt/scratch/benucci/aaftf_temporary/ \
    --workdir $project_dir/filtered/ \
    --out $project_dir/outputs/test_genome

and this is what I have int he directories

[benucci@dev-amd20 project_PleurotusMartina24]$ ll outputs/
total 2.1G
-rw-r----- 1 benucci ShadeLab  184 Apr 19 17:01 spades.list
-rw-r----- 1 benucci ShadeLab 544M Apr 19 17:01 test_genome_1P.fastq.gz
-rw-r----- 1 benucci ShadeLab 568M Apr 19 17:01 test_genome_2P.fastq.gz
-rw-r----- 1 benucci ShadeLab 497M Apr 19 17:12 test_genome_filtered_1.fastq.gz
-rw-r----- 1 benucci ShadeLab 525M Apr 19 17:12 test_genome_filtered_2.fastq.gz
-rw-r----- 1 benucci ShadeLab  74K Apr 19 17:11 test_genome.mito.fasta
[benucci@dev-amd20 project_PleurotusMartina24]$ ll filtered/
total 2.0M
-rw-r----- 1 benucci ShadeLab 1.8M Apr 19 17:11 contamdb.fa
-rw-r----- 1 benucci ShadeLab 1.9K Apr 19 17:11 GCF_000819615.1_ViralProj14015_genomic.fna.gz
-rw-r----- 1 benucci ShadeLab 1.7M Apr 19 17:11 UniVec
[benucci@dev-amd20 benucci]$ ll /mnt/scratch/benucci/aaftf_temporary/
total 0

It seems like is writing the filtered reads in the --out instead in the --workdir. Thank you,

Gian

hyphaltip commented 2 months ago

Is the outputs/test_dir there already. Is outputs already made

These are spades errors because a folder exists or possibly

Just leave workdir off k guess

I don’t use the pipeline function. I run steps individually so maybe you hit an untested parameter option?

Sent from Gmail Mobile @.*** Jason Stajich - UC Riverside http://lab.stajich.org

On Mon, Apr 22, 2024 at 8:57 AM Gian Nico @.***> wrote:

Hello Jason,

Thank you for the email. I still get the same error after following your suggestions.

This is the error

== Warning == output dir is not empty! Please, clean output directory before run.

SPAdes genome assembler v3.15.5

Usage: spades.py [options] -o spades.py: error: Please specify option (e.g. -1, -2, -s, etc)) for the following paths: --restart-from last

This is how I included the output directories

AAFTF pipeline \ ... --tmpdir /mnt/scratch/benucci/aaftf_temporary/ \ --workdir $project_dir/filtered/ \ --out $project_dir/outputs/test_genome

and this is what I have int he directories

@.*** project_PleurotusMartina24]$ ll outputs/ total 2.1G -rw-r----- 1 benucci ShadeLab 184 Apr 19 17:01 spades.list -rw-r----- 1 benucci ShadeLab 544M Apr 19 17:01 test_genome_1P.fastq.gz -rw-r----- 1 benucci ShadeLab 568M Apr 19 17:01 test_genome_2P.fastq.gz -rw-r----- 1 benucci ShadeLab 497M Apr 19 17:12 test_genome_filtered_1.fastq.gz -rw-r----- 1 benucci ShadeLab 525M Apr 19 17:12 test_genome_filtered_2.fastq.gz -rw-r----- 1 benucci ShadeLab 74K Apr 19 17:11 test_genome.mito.fasta

@.*** project_PleurotusMartina24]$ ll filtered/ total 2.0M -rw-r----- 1 benucci ShadeLab 1.8M Apr 19 17:11 contamdb.fa -rw-r----- 1 benucci ShadeLab 1.9K Apr 19 17:11 GCF_000819615.1_ViralProj14015_genomic.fna.gz -rw-r----- 1 benucci ShadeLab 1.7M Apr 19 17:11 UniVec

@.*** benucci]$ ll /mnt/scratch/benucci/aaftf_temporary/ total 0

It seems like is writing the filtered reads in the --out instead in the --workdir. Thank you,

Gian

— Reply to this email directly, view it on GitHub https://github.com/stajichlab/AAFTF/issues/27#issuecomment-2070005755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL5O2MV3LYGO2JP22ZBH3Y6UXOFAVCNFSM6AAAAABGN5OKGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZQGAYDKNZVGU . You are receiving this because you commented.Message ID: @.***>

Gian77 commented 2 months ago

Hello @hyphaltip

it seems now it is working just using these two parameters below:

--tmpdir /mnt/scratch/benucci/aaftf_temporary \
--out test_genome

Is running since 2 days, we'll see what I get... Thanks, Gian