stajichlab / AAFTF

Automatic Assembly For The Fungi
MIT License
19 stars 4 forks source link

Memory Insufficient for Samtools in Polish #30

Closed bpeacock44 closed 2 months ago

bpeacock44 commented 2 months ago

Here is the command I used, which I understand means I am allocating 100 GB of memory to the process: AAFTF polish --method polca -i DS1bio.rmdup.fasta -o DS1bio.polca.fasta -c 256 --left DS1bio_filtered_1.fastq.gz --right DS1bio_filtered_2.fastq.gz --memory 100

After a few days, the process failed. The samtools.err file says this: [bam_sort] -m setting (500K bytes) is less than the minimum required (1M). This is confusing, as I allocated more than 500k bytes in my command.

Other logs:

I didn't see an error in the bwa.err file - the last line is "[main] Real time: 350255.988 sec; CPU: 305420.456 sec" which I take to mean it ended successfully.

Polca.log indicates the same: /opt/linux/rocky/8.x/x86_64/pkgs/AAFTF/0.5.0/bin/bwa /opt/linux/rocky/8.x/x86_64/pkgs/AAFTF/0.5.0/bin/freebayes /opt/linux/rocky/8.x/x86_64/pkgs/AAFTF/0.5.0/bin/samtools [Wed Aug 7 05:31:21 PDT 2024] Creating BWA index for DS1bio.rmdup.fasta [Wed Aug 7 05:31:33 PDT 2024] Aligning reads to DS1bio.rmdup.fasta [Sun Aug 11 06:49:10 PDT 2024] Sorting and indexing alignment file [Sun Aug 11 06:49:10 PDT 2024] Sorting and indexing alignment file failed

bpeacock44 commented 2 months ago

I didn't notice this before either, but the pipeline printed the command after starting:

[bpeacock@r28] genomes$ AAFTF polish --method polca -i DS1bio.rmdup.fasta -o DS1bio.polca.fasta -c 256 --left DS1bio_filtered_1 .fastq.gz --right DS1bio_filtered_2.fastq.gz -m 100 [Aug 13 07:58 AM] Running AAFTF v0.5.0 CMD: polca.sh -a DS1bio.rmdup.fasta -r /path/to/genomes/DS1bio_filtered_1.fastq.gz /path/to/genomes/DS1bio_filtered_2.fastq.gz -t 256 -m 500K

Seems like it is ignoring the -m flag?

hyphaltip commented 2 months ago

I don't think the memory flag is applied to polca at least I didn't implement that that I recall - you may be running into problems with 256 cpus as it will be doing 256 x 500kb for separate sorting processes in samtools which will mean it will need at least 128gb for the job -- how much memory did you allocate to the slurm job?


Jason E Stajich, PhD Professor, Dept of Microbiology and Plant Pathology University of California, Riverside

Fellow, CIFAR Fungal Kingdom: Threats and Opportunities https://www.cifar.ca/research/program/fungal-kingdom email: @.*** twitter: @stajichlab http://twitter.com/stajichlab @hyphaltip http://twitter.com/hyphaltip @zygolife http://twitter.com/zygolife website: http://lab.stajich.org office: +1 951.827.2363 mobile: +1 909.333.6709

On Tue, Aug 13, 2024 at 8:03 AM Beth Peacock @.***> wrote:

I didn't notice this before either, but the pipeline printed the command after starting:

@.*** genomes$ AAFTF polish --method polca -i DS1bio.rmdup.fasta -o DS1bio.polca.fasta -c 256 --left DS1bio_filtered_1 .fastq.gz --right DS1bio_filtered_2.fastq.gz -m 100 [Aug 13 07:58 AM] Running AAFTF v0.5.0 CMD: polca.sh -a DS1bio.rmdup.fasta -r /path/to/genomes/DS1bio_filtered_1.fastq.gz /path/to/genomes/DS1bio_filtered_2.fastq.gz -t 256 -m 500K

Seems like it is ignoring the -m flag?

— Reply to this email directly, view it on GitHub https://github.com/stajichlab/AAFTF/issues/30#issuecomment-2286479144, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL5OZO7GF2TZXJ47NNMVTZRIN5BAVCNFSM6AAAAABMOIFAVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBWGQ3TSMJUGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

bpeacock44 commented 2 months ago

I just allocated 100gb - I can try 128gb instead or decrease the cpus if 256 is unnecessary. It took about 4 days to do the alignment.

On Tue, Aug 13, 2024 at 9:54 AM Jason Stajich @.***> wrote:

I don't think the memory flag is applied to polca at least I didn't implement that that I recall - you may be running into problems with 256 cpus as it will be doing 256 x 500kb for separate sorting processes in samtools which will mean it will need at least 128gb for the job -- how much memory did you allocate to the slurm job?


Jason E Stajich, PhD Professor, Dept of Microbiology and Plant Pathology University of California, Riverside

Fellow, CIFAR Fungal Kingdom: Threats and Opportunities https://www.cifar.ca/research/program/fungal-kingdom email: @.*** twitter: @stajichlab http://twitter.com/stajichlab @hyphaltip http://twitter.com/hyphaltip @zygolife http://twitter.com/zygolife website: http://lab.stajich.org office: +1 951.827.2363 mobile: +1 909.333.6709

On Tue, Aug 13, 2024 at 8:03 AM Beth Peacock @.***> wrote:

I didn't notice this before either, but the pipeline printed the command after starting:

@.*** genomes$ AAFTF polish --method polca -i DS1bio.rmdup.fasta -o DS1bio.polca.fasta -c 256 --left DS1bio_filtered_1 .fastq.gz --right DS1bio_filtered_2.fastq.gz -m 100 [Aug 13 07:58 AM] Running AAFTF v0.5.0 CMD: polca.sh -a DS1bio.rmdup.fasta -r /path/to/genomes/DS1bio_filtered_1.fastq.gz /path/to/genomes/DS1bio_filtered_2.fastq.gz -t 256 -m 500K

Seems like it is ignoring the -m flag?

— Reply to this email directly, view it on GitHub https://github.com/stajichlab/AAFTF/issues/30#issuecomment-2286479144,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAAL5OZO7GF2TZXJ47NNMVTZRIN5BAVCNFSM6AAAAABMOIFAVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBWGQ3TSMJUGQ>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/stajichlab/AAFTF/issues/30#issuecomment-2286589748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWZYN6LTXCYUNBMIC5B5MLZRIT3VAVCNFSM6AAAAABMOIFAVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBWGU4DSNZUHA . You are receiving this because you authored the thread.Message ID: @.***>

hyphaltip commented 2 months ago

you might try to subsample your dataset - if you did a denovo assembly with the illumina data it is unlikely you will get a whole lot of improvement in polishing anyways.

bpeacock44 commented 2 months ago

Just for future users: when I dropped CPU down to 60, the command had increased allotted memory. This was the printout after starting:

[Aug 26 12:07 PM] Running AAFTF v0.5.0 CMD: polca.sh -a DS1bio.rmdup.fasta -r /rhome/bpeacock/bigdata/PN106_regenome_fungi/working_AAFTF/DS1bio_filtered_1.fastq.gz /rhome/bpeacock/bigdata/PN106_regenome_fungi/working_AAFTF/DS1bio_filtered_2.fastq.gz -t 60 -m 1G

Working great!