t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
39 stars 23 forks source link

Trimmed reads caused slamdunk to stop at count #129

Closed tiffge closed 1 year ago

tiffge commented 1 year ago

Hi!

Sorry I'm back again with a different issue. I was trying to run slamdunk on an old slam-seq run someone in the lab performed a few years ago, but they trimmed their reads and I think that caused the pipeline to throw this issue:

Difference between minimum and maximum read length is > 10. Please specify --max-read-length parameter.

I went through the filtered bam files and used GenomicAlignments (R package) to figure out the max read lengths of each sample (which was 38) and then tried running the following line in Docker Desktop.

slamdunk count -o slamout_230412 -r genome_s288c_fordunks.fsa -b NM_sort_short_UTR_coord_startone.bed -l 38 -s ./slamout_230412/snp bam ./slamout_230412/filter/*.bam

This (and removing "./" before slamout) didn't work, so I tried to run it on an individual file like so:

slamdunk count -o slamout_230412 -r genome_s288c_fordunks.fsa -b NM_sort_short_UTR_coord_startone.bed -l 38 -s ./slamout_230412/snp bam ./slamout_230412/filter/B29-NM01_S1_trimmed_R1.fastq_slamdunk_mapped_filtered.bam

but that also didn't work and I received this error Running slamDunk tcount for 2 files (1 threads) [E::hts_open_format] Failed to open file bam Traceback (most recent call last): File "/opt/conda/envs/slamdunk/bin/slamdunk", line 8, in <module> sys.exit(run()) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 474, in run results = Parallel(n_jobs=n, verbose=verbose)(delayed(runCount)(tid, args.bam[tid], args.ref, args.bed, args.maxLength, args.minQual, args.conversionThreshold, outputDirectory, snpDirectory) for tid in range(0, len(args.bam))) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/parallel.py", line 1003, in __call__ if self.dispatch_one_batch(iterator): File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/parallel.py", line 834, in dispatch_one_batch self._dispatch(tasks) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/parallel.py", line 753, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 201, in apply_async result = ImmediateResult(func) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 582, in __init__ self.results = batch() File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/parallel.py", line 256, in __call__ for func, args, kwargs in self.items] File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/joblib/parallel.py", line 256, in <listcomp> for func, args, kwargs in self.items] File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/slamdunk.py", line 198, in runCount tcounter.computeTconversions(ref, bed, inputSNP, bam, maxLength, minQual, outputCSV, outputBedgraphPlus, outputBedgraphMinus, conversionThreshold, log) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/dunks/tcounter.py", line 129, in computeTconversions sampleInfo = getSampleInfo(bam) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/utils/misc.py", line 239, in getSampleInfo sampleInfo = getReadGroup(bam) File "/opt/conda/envs/slamdunk/lib/python3.7/site-packages/slamdunk/utils/misc.py", line 230, in getReadGroup bamFile = pysam.AlignmentFile(bam) File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.__cinit__ File "pysam/libcalignmentfile.pyx", line 940, in pysam.libcalignmentfile.AlignmentFile._open FileNotFoundError: [Errno 2] could not open alignment filebam: No such file or directory

I looked through some past issues on your github to see how others solved this issue in the past, but could only see that specifying the max read length was the solution. Do you have any idea as to why this is failing?

t-neumann commented 1 year ago

Hi - yes that parameter is a bit annoying. It works perfectly fine if you set it to the Illumina raw read length before trimming (which should be the same across all reads) and proceed with that, it doesnt matter if the actual supplied reads are shorter. I would try a quick slamdunk all on it with setting the -rl to your read length from the sequencer and see if that solves the problem.

tiffge commented 1 year ago

Sure, I'll give it a try! Not sure if it will be quick though because it'll have to go through all the steps again (unless I can just run it with the same output folders and it will know to move onto the next step?)

tiffge commented 1 year ago

That worked! Thanks for the help 😄