mrvollger / StainedGlass

Make colorful identity heatmaps of genomic sequence
https://mrvollger.github.io/StainedGlass/
MIT License
98 stars 10 forks source link

samtools sort: truncated file. Aborting #37

Closed molecule53 closed 6 months ago

molecule53 commented 6 months ago

Hello,

I am trying to run a test with Col-CEN_v1.2.fasta. Installed snakemake 8.2.1 (snakemake) ubuntu@ip-172-31-21-137:/Data1$ snakemake --version 8.2.1

Got 8 out of 18 steps completed before getting an error bellow:

8 of 18 steps (44%) done Select jobs to execute... Execute 2 jobs...

[Fri Feb 23 14:04:11 2024] localrule aln: input: temp/arabidopsis.2000.10000.ref_0.fasta.mmi, temp/arabidopsis.2000.0.query.fasta output: temp/arabidopsis.2000.10000.0.ref_0.bam log: logs/aln.arabidopsis.2000.10000.0.ref_0.log jobid: 4 reason: Missing output files: temp/arabidopsis.2000.10000.0.ref_0.bam; Input files updated by another job: temp/arabidopsis.2000.0.query.fasta, temp/arabidopsis.2000.10000.ref_0.fasta.mmi wildcards: SM=arabidopsis, W=2000, F=10000, ID=0, REF_ID=ref_0 threads: 4 resources: mem_mb=16096, mem_mib=15351, disk_mb=1395, disk_mib=1331, tmpdir=/tmp, runtime=120, mem=4

Activating conda environment: .snakemake/conda/b2cd0897a16d0f01d2fdff5b68582316_

[Fri Feb 23 14:04:11 2024] localrule aln: input: temp/arabidopsis.2000.10000.ref_0.fasta.mmi, temp/arabidopsis.2000.2.query.fasta output: temp/arabidopsis.2000.10000.2.ref_0.bam log: logs/aln.arabidopsis.2000.10000.2.ref_0.log jobid: 12 reason: Missing output files: temp/arabidopsis.2000.10000.2.ref_0.bam; Input files updated by another job: temp/arabidopsis.2000.2.query.fasta, temp/arabidopsis.2000.10000.ref_0.fasta.mmi wildcards: SM=arabidopsis, W=2000, F=10000, ID=2, REF_ID=ref_0 threads: 4 resources: mem_mb=16096, mem_mib=15351, disk_mb=1395, disk_mib=1331, tmpdir=/tmp, runtime=120, mem=4

Activating conda environment: .snakemake/conda/b2cd0897a16d0f01d2fdff5b68582316_ [Fri Feb 23 15:24:49 2024] Error in rule aln: jobid: 4 input: temp/arabidopsis.2000.10000.ref_0.fasta.mmi, temp/arabidopsis.2000.0.query.fasta output: temp/arabidopsis.2000.10000.0.ref_0.bam log: logs/aln.arabidopsis.2000.10000.0.ref0.log (check log file(s) for error details) conda-env: /Data1/StainedGlass/.snakemake/conda/b2cd0897a16d0f01d2fdff5b68582316 shell:

    ( minimap2             -t 4             -f 10000 -s 400             -ax ava-ont              --dual=yes --eqx             temp/arabidopsis.2000.10000.ref_0.fasta.mmi temp/arabidopsis.2000.0.query.fasta                 | samtools sort -m 4G                     -o temp/arabidopsis.2000.10000.0.ref_0.bam         ) 2> logs/aln.arabidopsis.2000.10000.0.ref_0.log

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Logfile logs/aln.arabidopsis.2000.10000.0.ref_0.log:

[M::main::3.4420.99] loaded/built the index for 66045 target sequence(s) [M::mm_mapopt_update::3.4420.99] mid_occ = 10000 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 66045 [M::mm_idx_stat::3.949*0.99] distinct minimizers: 27997222 (78.45% are singletons); average occurrences: 1.599; average spacing: 2.951; total length: 132081078 [W::sam_read1_sam] Parse error at line 601148 samtools sort: truncated file. Aborting

[Fri Feb 23 16:01:34 2024] Finished job 12. 9 of 18 steps (50%) done Removing temporary output temp/arabidopsis.2000.2.query.fasta. Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-02-23T135956.510384.snakemake.log WorkflowError: At least one job did not complete successfully. real 121m57.517s user 761m39.147s sys 3m31.843s

mrvollger commented 6 months ago

Sorry, but I cannot recreate this with this input. Did you perhaps run out of disk space during the run? Have you tried rerunning?

molecule53 commented 6 months ago

I was using 8 vCPU and 32GiB memory for the run.

I will try to rerun with different setting if you have any recommendations (I am using AWS and can request a different EC2 for the run).

Do I understand it correctly that Col-CEN_v1.2 test data should finish running in about 1 hr?


From: Mitchell Robert Vollger @.> Sent: Friday, February 23, 2024 6:29 PM To: mrvollger/StainedGlass @.> Cc: McKinlay, Anastasia @.>; Author @.> Subject: [External] Re: [mrvollger/StainedGlass] samtools sort: truncated file. Aborting (Issue #37)

This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.

Sorry, but I cannot recreate this with this input. Did you perhaps run out of disk space during the run? Have you tried rerunning?

— Reply to this email directly, view it on GitHubhttps://github.com/mrvollger/StainedGlass/issues/37#issuecomment-1962133617, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6WBDRIO5HXSGN5YR6NOLLTYVEQ65AVCNFSM6AAAAABDXJGODOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGEZTGNRRG4. You are receiving this because you authored the thread.Message ID: @.***>

mrvollger commented 6 months ago

Yeah, about 1 hour is right, but I have variance between systems. Just now, for me, it was 1.5 hours.

I think I would try requesting more disk space in your instance. but I am not an aws person...

molecule53 commented 6 months ago

Hi Mitchell,

I got it working by increasing the memory but still have a couple of problems with visualization:

I successfully finished running

time snakemake --cores 8 --config sample=arabidopsis fasta=Col-CEN_v1.2.fasta

Then, I ran:

snakemake --cores 24 cooler_density --config window=32 cooler_window=100 snakemake --cores 24 make_figures

and got these files in the results folder:

(snakemake) @.***:/Data1/StainedGlass/results$ ls

arabidopsis.2000.10000.bed.gz

arabidopsis.2000.10000.sorted.bam.csi

output_small_32.sam.gz

small.2000.10000.sorted.bam

small.2000.fasta

small.32.fasta

arabidopsis.2000.10000.full.tbl.gz

arabidopsis.2000.fasta

small.2000.10000.bed.gz

small.2000.10000.sorted.bam.csi

small.32.100.density.cool

arabidopsis.2000.10000.sorted.bam

contacts_small_32.gz

small.2000.10000.full.tbl.gz

small.2000.10000_figures

small.32.100.density.mcool

In the "small.2000.10000_figures" folder there are no .pdf or .png figures that look like Arabidopsis examples provided in the "images" folder but more like dot plots.

Also, I tried to visualize small.32.100.density.mcool on resgen but it also looks like dot plot. I would like to get heatmap-like images.

How can I get such type of images?

Thank you!


From: Mitchell Robert Vollger @.> Sent: Friday, February 23, 2024 6:49 PM To: mrvollger/StainedGlass @.> Cc: McKinlay, Anastasia @.>; Author @.> Subject: [External] Re: [mrvollger/StainedGlass] samtools sort: truncated file. Aborting (Issue #37)

This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.

Yeah, about 1 hour is right, but I have variance between systems. Just now, for me, it was 1.5 hours.

I think I would try requesting more disk space in your instance. but I am not an aws person...

— Reply to this email directly, view it on GitHubhttps://github.com/mrvollger/StainedGlass/issues/37#issuecomment-1962144916, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6WBDRI5D542XKOKQPVSMW3YVETJJAVCNFSM6AAAAABDXJGODOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGE2DIOJRGY. You are receiving this because you authored the thread.Message ID: @.***>