t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
39 stars 23 forks source link

map function fails part way through samples #125

Closed tiffge closed 1 year ago

tiffge commented 1 year ago

Hi, I'm running SLAM-DUNK in a Docker container that I've been using for some time. It's been running smoothly for the most part on some test data and I finally got my sequencing run back and started processing it with slamdunk all. It got through 33 out of 40 samples and then failed. I tried manually running slamdunk map on samples 34 and 35 but I keep getting this error:

slamdunk map -r genome_s288c_fordunks.fsa -o slamout_230228 -ss ../Reads/B44-TG35_S35_R1_001.fastq.gz

Running slamDunk map for 1 files (1 threads) Traceback (most recent call last): File "/opt/conda/envs/slamdunk/bin/slamdunk", line 8, in sys.exit(run()) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 436, in run runMap(tid, bam, referenceFile, n, args.trim5, args.maxPolyA, args.quantseq, args.endtoend, args.topn, sampleInfo, outputDirectory, args.skipSAM) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 148, in runMap mapper.Map(inputBAM, referenceFile, outputSAM, getLogFile(outputLOG), quantseqMapping, endtoendMapping, threads=threads, trim5p=trim5p, maxPolyA=maxPolyA, topn=topn, sampleId=tid, sampleName=sampleName, sampleType=sampleType, sampleTime=sampleTime, printOnly=printOnly, verbose=verbose) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/dunks/mapper.py", line 107, in Map run("ngm -b -r " + inputReference + " -q " + inputBAM + " -t " + str(threads) + " " + parameter + " -o " + outputSAM, log, verbose=verbose, dry=printOnly) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/utils/misc.py", line 196, in run raise RuntimeError("Error while executing command: \"" + cmd + "\"") RuntimeError: Error while executing command: "ngm -b -r genome_s288c_fordunks.fsa -q ../Reads/B44-TG35_S35_R1_001.fastq.gz -t 1 --no-progress --slam-seq 2 -5 12 --max-polya 4 -l --rg-id 0 --rg-sm sample_0:NA:-1 -o slamout_230228/B44-TG35_S35_R1_001.fastq_slamdunk_mapped.bam"

I tried looking at the log file but it was blank, but the mapped.bam filesize for sample 34 was not zero.

I'm just confused because the pipeline was working just yesterday and I ran fastqc and multiqc locally on the fastq files and they look fine. Where is this error coming from and why did it occur part way?

t-neumann commented 1 year ago

Hm could you try simply running the indicated command standalone in and see what happens?

ngm -b -r genome_s288c_fordunks.fsa -q ../Reads/B44-TG35_S35_R1_001.fastq.gz -t 1 --no-progress --slam-seq 2 -5 12 --max-polya 4 -l --rg-id 0 --rg-sm sample_0:NA:-1 -o slamout_230228/B44-TG35_S35_R1_001.fastq_slamdunk_mapped.bam

tiffge commented 1 year ago

That gives me this error:

[MAIN] NextGenMap 0.5.5 [MAIN] Startup : x64 (build Mar 2 2019 21:24:17) [MAIN] Starting time: 2023-03-01.15:59:13 [CONFIG] Parameter: --affine 0 --argos_min_score 0 --bam 1 --bin_size 2 --block_multiplier 2 --broken_pairs 0 --bs_cutoff 6 --bs_mapping 0 --cpu_threads 1 --dualstrand 1 --fast 0 --fast_pairing 0 --force_rlength_check 0 --format 2 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --hard_clip 0 --keep_tags 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --local 1 --match_bonus 10 --match_bonus_tc 2 --match_bonus_tt 10 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --max_polya 4 --max_read_length 0 --min_identity 0.650000 --min_insert_size 0 --min_mq 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 1 --no_unal 0 --ocl_threads 1 --output slamout_230228/B44-TG35_S35_R1_001.fastq_slamdunk_mapped.bam --overwrite 1 --pair_score_cutoff 0.900000 --paired 0 --parse_all 1 --pe_delimiter / --qry ../Reads/B44-TG35_S35_R1_001.fastq.gz --qry_count -1 --qry_start 0 --ref genome_s288c_fordunks.fsa --ref_mode -1 --rg_id 0 --rg_sm sample_0:NA:-1 --sensitive 0 --silent_clip 0 --skip_mate_check 0 --skip_save 0 --slam_seq 2 --step_count 4 --strata 0 --topn 1 --trim5 12 --update_check 0 --very_fast 0 --very_sensitive 0 [NGM] Opening for output (BAM): slamout_230228/B44-TG35_S35_R1_001.fastq_slamdunk_mapped.bam [SEQPROV] Reading encoded reference from genome_s288c_fordunks.fsa-enc.2.ngm [SEQPROV] Reading 12 Mbp from disk took 0.00s [PREPROCESS] Reading RefTable from genome_s288c_fordunks.fsa-ht-13-2.3.ngm [PREPROCESS] Reading from disk took 0.23s [PREPROCESS] Max. k-mer frequency set so 100! [INPUT] Input is single end data. [INPUT] Opening file ../Reads/B44-TG35_S35_R1_001.fastq.gz for reading [INPUT] Input is Fastq [INPUT] Estimating parameter from data [INPUT] Reads found in files: 8203618 [INPUT] Average read length: 38 (min: 38, max: 40) [INPUT] Corridor width: 10 [INPUT] Average kmer hits pro read: 4.440725 [INPUT] Max possible kmer hit: 8 [INPUT] Estimated sensitivity: 0.555091 [INPUT] Estimating parameter took 8.727s [INPUT] Input is Fastq [OPENCL] Available platforms: 1 [OPENCL] AMD Accelerated Parallel Processing [OPENCL] Selecting OpenCl platform: AMD Accelerated Parallel Processing [OPENCL] Platform: OpenCL 1.2 AMD-APP (1214.3) [OPENCL] 1 CPU device found. [OPENCL] Device 0: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz (Driver: 1214.3 (sse2,avx)) [OPENCL] 2 CPU cores available. [OPENCL] Build failed: Program build failure [OPENCL] Build status: build failed [OPENCL] Build log: [OPENCL] Internal Error: as failed [OPENCL] Codegen phase failed compilation. [OPENCL] Unable to build program end. Error: Program build failure (-11) terminate called without an active exception Aborted

Does this mean I need to build a new docker container bc the old one failed?

t-neumann commented 1 year ago

That is weird because you said it worked on another sample correct? So if you were to run the command on one of the samples that went through, that would work?

tiffge commented 1 year ago

It doesn't work anymore for some reason. The code I tried running was: slamdunk map -r genome_s288c_fordunks.fsa -o slamout_230228/ -ss ../Reads/B44-TG01_S1_R1_001.fastq.gz but I get the same error:

File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/dunks/mapper.py", line 107, in Map run("ngm -b -r " + inputReference + " -q " + inputBAM + " -t " + str(threads) + " " + parameter + " -o " + outputSAM, log, verbose=verbose, dry=printOnly) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/utils/misc.py", line 196, in run raise RuntimeError("Error while executing command: \"" + cmd + "\"") RuntimeError: Error while executing command: "ngm -b -r genome_s288c_fordunks.fsa -q ../Reads/B44-TG01_S1_R1_001.fastq.gz -t 1 --no-progress --slam-seq 2 -5 12 --max-polya 4 -l --rg-id 0 --rg-sm sample_0:NA:-1 -o slamout_230228/B44-TG01_S1_R1_001.fastq_slamdunk_mapped.bam"

but I did actually try that last night and got a different error: slamdunk map -r genome_s288c_fordunks.fsa -o slamout_230228 -ss ../Reads/B44-TG01_S1_R1_001.fastq.gz

Creating output directory: slamout_230228 Traceback (most recent call last): File "/opt/conda/envs/slamdunk/bin/slamdunk", line 8, in sys.exit(run()) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 423, in run createDir(outputDirectory) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 80, in createDir os.makedirs(directory) File "/opt/conda/envs/slamdunk/lib/python3.7/os.py", line 221, in makedirs mkdir(name, mode) OSError: [Errno 28] No space left on device: 'slamout_230228'

What I ended up doing was deleting some files on my local computer in hopes of making some more space, looking into the log file of TG001 (that worked previously), then running ngm --update-check because I saw this in the last line:

'[UPDATE_CHECK] Your version of NGM is more than 6 months old - a newer version may be available. (For performing an automatic check use --update-check)\n'

But now I don't get the "no space" error and instead it's the same as the error from sample 34

t-neumann commented 1 year ago

Can you send me the full command you are using - including the startup of the Docker container?

tiffge commented 1 year ago

Sure, but I am not really using many scripts to startup the Docker container (I am using Docker Desktop). Here are the steps I am taking:

  1. open docker desktop, start the slam-seq container
  2. open terminal, copy over Reads and references (in Slam folder) from local to docker container:
    docker cp ~/Documents/SlamDunk/Slam_Setup interesting_mcclintock:/Slam 
    docker cp ~/Documents/SlamDunk/Reads interesting_mcclintock:/Reads
  3. run slamdunk in the Slam folder on Docker desktop slamdunk all -r genome_s288c_fordunks.fsa -b NM_sort_short_UTR_coord_startone.bed -o slamout_230228 -ss ../Reads/*fastq.gz

Please let me know if there's anything else I can clarify or provide.

t-neumann commented 1 year ago

Ok I see - and with the same setup if you run it on another sample that worked it runs through without error?

tiffge commented 1 year ago

With the same setup now, I cannot seem to run it on any samples anymore, even those that worked previously.

I'm getting this error now for sample 1, which worked before: slamdunk map -r genome_s288c_fordunks.fsa -o slamout_230228/ -ss ../Reads/B44-TG01_S1_R1_001.fastq.gz

Running slamDunk map for 1 files (1 threads) Traceback (most recent call last): File "/opt/conda/envs/slamdunk/bin/slamdunk", line 8, in sys.exit(run()) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 436, in run runMap(tid, bam, referenceFile, n, args.trim5, args.maxPolyA, args.quantseq, args.endtoend, args.topn, sampleInfo, outputDirectory, args.skipSAM) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 148, in runMap mapper.Map(inputBAM, referenceFile, outputSAM, getLogFile(outputLOG), quantseqMapping, endtoendMapping, threads=threads, trim5p=trim5p, maxPolyA=maxPolyA, topn=topn, sampleId=tid, sampleName=sampleName, sampleType=sampleType, sampleTime=sampleTime, printOnly=printOnly, verbose=verbose) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/dunks/mapper.py", line 107, in Map run("ngm -b -r " + inputReference + " -q " + inputBAM + " -t " + str(threads) + " " + parameter + " -o " + outputSAM, log, verbose=verbose, dry=printOnly) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/utils/misc.py", line 196, in run raise RuntimeError("Error while executing command: \"" + cmd + "\"") RuntimeError: Error while executing command: "ngm -b -r genome_s288c_fordunks.fsa -q ../Reads/B44-TG01_S1_R1_001.fastq.gz -t 1 --no-progress --slam-seq 2 -5 12 --max-polya 4 -l --rg-id 0 --rg-sm sample_0:NA:-1 -o slamout_230228/B44-TG01_S1_R1_001.fastq_slamdunk_mapped.bam"

I even tried to update the docker image by pulling again (it was already the most updated version) and I reuploaded my fastq files and slam references.

t-neumann commented 1 year ago

Hm could you try maybe with the nf-core/slamseq container to rule out it's a container issue?

t-neumann commented 1 year ago

Sorry this one: nfcore/slamseq:1.0.0

tiffge commented 1 year ago

I'm getting an error when I try to copy over the startup folders (Slam Setup and Reads) into the nfcore container

Error response from daemon: Error processing tar file(exit status 1): Error setting up pivot dir: mkdir /var/lib/docker/overlay2/1f178af8e039a07ee406468b991b1b7fef422fb65f2d4dd94e27b07c3cd32e77/merged/.pivot_root1284882696: no space left on device

I looked at stackexchange to see what to do and they recommend doing docker system prune but it says that:

WARNING! This will remove:

  • all stopped containers
  • all networks not used by at least one container
  • all dangling images
  • all dangling build cache

Are you sure you want to continue? [y/N]

But I don't want to lose the slamseq container. Should I keep both containers (nfcore and slamseq) running before running this prune step? And will this solve the issue?

t-neumann commented 1 year ago

I would stop all containers and delete them - seems like you are running into a space problem that might be the cause of the initial error

tiffge commented 1 year ago

Thanks I did just that and it seems to be running. Will let you know how it goes!

tiffge commented 1 year ago

It worked! I'm currently running alleyoop on the finished samples. Would I have to delete the container before processing a large batch of samples everytime? Also a slightly tangent question, but what is considered too low and too high of a conversion rate? I think I have samples ranging from 0.12% to 10% T > C conversions and I saw on the slamseq paper that 15% was too high.

t-neumann commented 1 year ago

Did you also get it working with tobneu/slamdunk then? I don't think you have to delete the container, just properly terminate it. Regarding conversion rates, we saw between 2-5% on our cells, but depending on labelling times and cell types this could potentially be more.

tiffge commented 1 year ago

Yes, I used tobneu/slamdunk! I thought terminating the container doesn't reset the storage (my outputs are still seen on the terminal). But thanks for the help and for letting me know about the conversion rates--mine seem to be a bit variable for the same cell type (yeast), so I'll have to figure out why they're so different.

varma-shivani98 commented 1 year ago

Hey, I am facing the same issue. Not able to solve it.

Command :

slamdunk all -r /home/ubuntu/hg38.fa -b /home/ubuntu/hg38_UCSC_3UTR.bed -o /home/ubuntu/AT -t 8 /media/volume/230224_SLAMseq_LNCaPMYC/L_A_1_S10_R1_001.fastq /media/volume/230224_SLAMseq_LNCaPMYC/L_A_2_S21_R1_001.fastq

Error: slamdunk all Running slamDunk map for 2 files (8 threads) Traceback (most recent call last): File "/home/ubuntu/anaconda3/bin/slamdunk", line 8, in sys.exit(run()) File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 520, in run runAll(args) File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 245, in runAll runMap(tid, bam, referenceFile, n, args.trim5, args.maxPolyA, args.quantseq, args.endtoend, args.topn, sampleInfo, dunkPath, args.skipSAM) File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 149, in runMap mapper.Map(inputBAM, referenceFile, outputSAM, getLogFile(outputLOG), quantseqMapping, endtoendMapping, threads=threads, trim5p=trim5p, maxPolyA=maxPolyA, topn=topn, sampleId=tid, sampleName=sampleName, sampleType=sampleType, sampleTime=sampleTime, printOnly=printOnly, verbose=verbose) File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/slamdunk/dunks/mapper.py", line 104, in Map run("ngm -r " + inputReference + " -q " + inputBAM + " -t " + str(threads) + " " + parameter + " -o " + outputSAM, log, verbose=verbose, dry=printOnly) File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/slamdunk/utils/misc.py", line 196, in run raise RuntimeError("Error while executing command: \"" + cmd + "\"") RuntimeError: Error while executing command: "ngm -r /home/ubuntu/hg38.fa -q /media/volume/230224_SLAMseq_LNCaPMYC/L_A_1_S10_R1_001.fastq -t 8 --no-progress --slam-seq 2 -5 12 --max-polya 4 -l --rg-id 0 --rg-sm L_A_1_S10_R1_001:pulse:0 -o /home/ubuntu/AT/map/L_A_1_S10_R1_001_slamdunk_mapped.sam"

Can you please suggest what can I do ?

t-neumann commented 1 year ago

Hi - like above, I would run the command standalone in the Docker container and see if that works ngm -r /home/ubuntu/hg38.fa -q /media/volume/230224_SLAMseq_LNCaPMYC/L_A_1_S10_R1_001.fastq -t 8 --no-progress --slam-seq 2 -5 12 --max-polya 4 -l --rg-id 0 --rg-sm L_A_1_S10_R1_001:pulse:0 -o /home/ubuntu/AT/map/L_A_1_S10_R1_001_slamdunk_mapped.sam