t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
39 stars 23 forks source link

mapping skipped sorting and indexing #140

Closed Error-fre closed 7 months ago

Error-fre commented 9 months ago

Hello,

I ran slamDunk for my file using this command: slamdunk all -r /mnt/dv/wid/projects1/Sridharan-Templates/Gencode_Mouse_mm10_Ref/GRCm38.p6.genome.fa -b /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/SLAMDUNK_mESC_UTR_regions_mm10.bed -o /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq -5 12 -t 1 -rl 50 /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/XZ-10_S1_R1_001.fastq

But ran into this error:

Running slamDunk map for 1 files (1 threads) . Running slamDunk sam2bam for 1 files (1 threads) . Running slamDunk filter for 1 files (1 threads) .

Creating output directory: /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/snp

Running slamDunk SNP for 1 files (1 threads) . Creating output directory: /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/count Running slamDunk tcount for 1 files (1 threads) [E::idx_find_and_load] Could not retrieve index file for '/mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/filter/XZ-10_S1_R1_001_slamdunk_mapped_filtered.bam' [E::idx_find_and_load] Could not retrieve index file for '/mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/filter/XZ-10_S1_R1_001_slamdunk_mapped_filtered.bam' Traceback (most recent call last): File "/mnt/ws/home/xzhang/.conda/envs/py3/bin/slamdunk", line 8, in sys.exit(run()) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 520, in run runAll(args) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 328, in runAll results = Parallel(n_jobs=n, verbose=verbose)(delayed(runCount)(tid, dunkbufferIn[tid], referenceFile, args.bed, args.maxLength, args.minQual, args.conversionThreshold, dunkPath, snpDirectory, vcfFile) for tid in range(0, len(samples))) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/joblib/parallel.py", line 1863, in call return output if self.return_generator else list(output) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output res = func(*args, **kwargs) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 202, in runCount tcounter.computeTconversions(ref, bed, inputSNP, bam, maxLength, minQual, outputCSV, outputBedgraphPlus, outputBedgraphMinus, conversionThreshold, log) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/dunks/tcounter.py", line 131, in computeTconversions slamseqInfo = SlamSeqInfo(bam) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/utils/misc.py", line 67, in init DS = ast.literal_eval(getReadGroup(bam)['DS']) KeyError: 'DS'

Here's the log file:

b'[MAIN] NextGenMap 0.5.5\n'b'[MAIN] Startup : x64 (build Jul 15 2018 19:15:59)\n'b'[MAIN] Starting time: 2024-01-30.11:02:03\n'b'[CONFIG] Parameter: --affine 0 --argos_min_score 0 --bam 1 --bin_size 2 --block_multiplier 2 --broken_pairs 0 --bs_cutoff 6 --bs_mapping 0 --cpu_threads 1 --dualstrand 1 --fast 0 --fast_pairing 0 --force_rlength_check 0 --format 2 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --hard_clip 0 --keep_tags 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --local 1 --match_bonus 10 --match_bonus_tc 2 --match_bonus_tt 10 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --max_polya 4 --max_read_length 0 --min_identity 0.650000 --min_insert_size 0 --min_mq 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 1 --no_unal 0 --ocl_threads 1 --output /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/map/XZ-10_S1_R1_001_slamdunk_mapped.bam --overwrite 1 --pair_score_cutoff 0.900000 --paired 0 --parse_all 1 --pe_delimiter / --qry /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/XZ-10_S1_R1_001.fastq --qry_count -1 --qry_start 0 --ref /mnt/dv/wid/projects1/Sridharan-Templates/Gencode_Mouse_mm10_Ref/GRCm38.p6.genome.fa --ref_mode -1 --rg_id 0 --rg_sm XZ-10_S1_R1_001:pulse:0 --sensitive 0 --silent_clip 0 --skip_mate_check 0 --skip_save 0 --slam_seq 2 --step_count 4 --strata 0 --topn 1 --trim5 12 --update_check 0 --very_fast 0 --very_sensitive 0\n'b'[NGM] Opening for output (BAM): /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/map/XZ-10_S1_R1_001_slamdunk_mapped.bam\n'b'[SEQPROV] Reading encoded reference from /mnt/dv/wid/projects1/Sridharan-Templates/Gencode_Mouse_mm10_Ref/GRCm38.p6.genome.fa-enc.2.ngm\n'b'[SEQPROV] Reading 2755 Mbp from disk took 23.51s\n'b'[PREPROCESS] Reading RefTable from /mnt/dv/wid/projects1/Sridharan-Templates/Gencode_Mouse_mm10_Ref/GRCm38.p6.genome.fa-ht-13-2.3.ngm\n'b'[PREPROCESS] Reading from disk took 51.66s\n'b'[PREPROCESS] Max. k-mer frequency set so 1014!\n'b'[INPUT] Input is single end data.\n'b'[INPUT] Opening file /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/XZ-10_S1_R1_001.fastq for reading\n'b'[INPUT] Input is Fastq\n'b'[INPUT] Estimating parameter from data\n'b'[INPUT] Reads found in files: 3516886\n'b'[INPUT] Average read length: 38 (min: 38, max: 40)\n'b'[INPUT] Corridor width: 10\n'b'[INPUT] Average kmer hits pro read: 5.223640\n'b'[INPUT] Max possible kmer hit: 8\n'b'[INPUT] Estimated sensitivity: 0.652955\n'b'[INPUT] Estimating parameter took 10.647s\n'b'[INPUT] Input is Fastq\n'b'[OPENCL] Available platforms: 1\n'b'[OPENCL] AMD Accelerated Parallel Processing\n'b'[OPENCL] Selecting OpenCl platform: AMD Accelerated Parallel Processing\n'b'[OPENCL] Platform: OpenCL 1.2 AMD-APP (1214.3)\n'b'[OPENCL] 1 CPU device found.\n'b'[OPENCL] Device 0: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz (Driver: 1214.3 (sse2))\n'b'[OPENCL] 24 CPU cores available.\n'b'[MAIN] Alignments computed: 3422640\n'b'[MAIN] Done (3392341 reads mapped (96.46%), 124545 reads not mapped, 3516886 lines written)(elapsed: 2893.492432s)\n'b'[UPDATE_CHECK] Your version of NGM is more than 6 months old - a newer version may be available. (For performing an automatic check use --update-check)\n'Skipped mapping for /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/XZ-10_S1_R1_001.fastq Skipped sorting for /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/map/XZ-10_S1_R1_001_slamdunk_mapped.sam Skipped mapping for /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/XZ-10_S1_R1_001.fastq

I checked my ngm version and it is using NextGenMap 0.5.5 which is the latest.

Please let me know if you need further information.

Thanks, Daria

Error-fre commented 9 months ago

There's only one bam file and one log file in the ./map folder. No .bai index file was generated.

Error-fre commented 9 months ago

I ran again using sample sheet to provide file input and got this in error file: image

slamdunk all Running slamDunk map for 2 files (1 threads) .. Running slamDunk filter for 2 files (1 threads) [E::idx_find_and_load] Could not retrieve index file for '/mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/map/XZ-12_S3_R1_001_slamdunk_mapped.bam' [E::idx_find_and_load] Could not retrieve index file for '/mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/filter/XZ-12_S3_R1_001_slamdunk_mapped_filtered.bam' [E::hts_idx_push] Unsorted positions on sequence #4: 150868257 followed by 18093560 [E::sam_index] Read 'K00408:362:HWNLCBBXY:8:1101:21531:1384' with ref_name='chr4', ref_length=156508116, flags=16, pos=18093560 cannot be indexed Traceback (most recent call last): File "/mnt/ws/home/xzhang/.conda/envs/py3/bin/slamdunk", line 8, in sys.exit(run()) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 520, in run runAll(args) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 274, in runAll results = Parallel(n_jobs=n, verbose=verbose)(delayed(runFilter)(tid, dunkbufferIn[tid], bed, args.mq, args.identity, args.nm, dunkPath) for tid in range(0, len(samples))) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/joblib/parallel.py", line 1863, in call return output if self.return_generator else list(output) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output res = func(*args, **kwargs) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 170, in runFilter filter.Filter(bam, outputBAM, getLogFile(outputLOG), bed, mq, minIdentity, maxNM, printOnly, verbose) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/dunks/filter.py", line 310, in Filter pysamIndex(outputBAM) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/utils/misc.py", line 214, in pysamIndex pysam.index(outputBam) # @UndefinedVariable File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/pysam/utils.py", line 69, in call raise SamtoolsError( pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "/mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/filter/XZ-12_S3_R1_001_slamdunk_mapped_filtered.bam"\n'

When I tried to sort and index the filtered bam file with samtools separately, I could generate the .vcf file but when I do count I still have the following error:

Running slamDunk tcount for 1 files (1 threads) Traceback (most recent call last): File "/mnt/ws/home/xzhang/.conda/envs/py3/bin/slamdunk", line 8, in sys.exit(run()) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 516, in run results = Parallel(n_jobs=n, verbose=verbose)(delayed(runCount)(tid, args.bam[tid], args.ref, args.bed, args.maxLength, args.minQual, args.conversionThreshold, outputDirectory, snpDirectory, vcfFile) for tid in range(0, len(args.bam))) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/joblib/parallel.py", line 1863, in call return output if self.return_generator else list(output) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output res = func(*args, **kwargs) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 202, in runCount tcounter.computeTconversions(ref, bed, inputSNP, bam, maxLength, minQual, outputCSV, outputBedgraphPlus, outputBedgraphMinus, conversionThreshold, log) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/dunks/tcounter.py", line 131, in computeTconversions slamseqInfo = SlamSeqInfo(bam) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/utils/misc.py", line 67, in init DS = ast.literal_eval(getReadGroup(bam)['DS']) KeyError: 'DS'

t-neumann commented 9 months ago

Hi - sorry for the long silence, I was on retreat this week.

hm that's I think a bugfix that is needed when simply supplying multiple fastq-files via wildcard "*". I think what should fix it is supplying the files in the sample sheet format (see https://t-neumann.github.io/slamdunk/docs.html#document-Quickstart and https://t-neumann.github.io/slamdunk/docs.html#sample-file). Let me know if this works for you.

Error-fre commented 9 months ago

I tried to use a .csv file and I went through mapping and filtering then had the following error MEF.csv

slamdunk all Running slamDunk map for 2 files (1 threads) .. Running slamDunk filter for 2 files (1 threads) [E::idx_find_and_load] Could not retrieve index file for '/mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/map/XZ-12_S3_R1_001_slamdunk_mapped.bam' [E::idx_find_and_load] Could not retrieve index file for '/mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/filter/XZ-12_S3_R1_001_slamdunk_mapped_filtered.bam' [E::hts_idx_push] Unsorted positions on sequence #4: 150868257 followed by 18093560 [E::sam_index] Read 'K00408:362:HWNLCBBXY:8:1101:21531:1384' with ref_name='chr4', ref_length=156508116, flags=16, pos=18093560 cannot be indexed Traceback (most recent call last): File "/mnt/ws/home/xzhang/.conda/envs/py3/bin/slamdunk", line 8, in sys.exit(run()) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 520, in run runAll(args) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 274, in runAll results = Parallel(n_jobs=n, verbose=verbose)(delayed(runFilter)(tid, dunkbufferIn[tid], bed, args.mq, args.identity, args.nm, dunkPath) for tid in range(0, len(samples))) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/joblib/parallel.py", line 1863, in call return output if self.return_generator else list(output) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output res = func(*args, **kwargs) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 170, in runFilter filter.Filter(bam, outputBAM, getLogFile(outputLOG), bed, mq, minIdentity, maxNM, printOnly, verbose) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/dunks/filter.py", line 310, in Filter pysamIndex(outputBAM) File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/slamdunk/utils/misc.py", line 214, in pysamIndex pysam.index(outputBam) # @UndefinedVariable File "/mnt/ws/home/xzhang/.conda/envs/py3/lib/python3.9/site-packages/pysam/utils.py", line 69, in call raise SamtoolsError( pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "/mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/filter/XZ-12_S3_R1_001_slamdunk_mapped_filtered.bam"\n'

In my map folder, I have only the mapped bam file and the log file as:

b'[MAIN] NextGenMap 0.5.5\n'b'[MAIN] Startup : x64 (build Jul 15 2018 19:15:59)\n'b'[MAIN] Starting time: 2024-01-31.21:31:21\n'b'[CONFIG] Parameter: --affine 0 --argos_min_score 0 --bam 1 --bin_size 2 --block_multiplier 2 --broken_pairs 0 --bs_cutoff 6 --bs_mapping 0 --cpu_threads 1 --dualstrand 1 --fast 0 --fast_pairing 0 --force_rlength_check 0 --format 2 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --hard_clip 0 --keep_tags 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --local 1 --match_bonus 10 --match_bonus_tc 2 --match_bonus_tt 10 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --max_polya 4 --max_read_length 0 --min_identity 0.650000 --min_insert_size 0 --min_mq 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 1 --no_unal 0 --ocl_threads 1 --output /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/map/XZ-12_S3_R1_001_slamdunk_mapped.bam --overwrite 1 --pair_score_cutoff 0.900000 --paired 0 --parse_all 1 --pe_delimiter / --qry /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/XZ-12_S3_R1_001.fastq --qry_count -1 --qry_start 0 --ref /mnt/dv/wid/projects1/Sridharan-Templates/Gencode_Mouse_mm10_Ref/GRCm38.p6.genome.fa --ref_mode -1 --rg_id 0 --rg_sm MEF_NL:0:0 --sensitive 0 --silent_clip 0 --skip_mate_check 0 --skip_save 0 --slam_seq 2 --step_count 4 --strata 0 --topn 1 --trim5 12 --update_check 0 --very_fast 0 --very_sensitive 0\n'b'[NGM] Opening for output (BAM): /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/map/XZ-12_S3_R1_001_slamdunk_mapped.bam\n'b'[SEQPROV] Reading encoded reference from /mnt/dv/wid/projects1/Sridharan-Templates/Gencode_Mouse_mm10_Ref/GRCm38.p6.genome.fa-enc.2.ngm\n'b'[SEQPROV] Reading 2755 Mbp from disk took 27.45s\n'b'[PREPROCESS] Reading RefTable from /mnt/dv/wid/projects1/Sridharan-Templates/Gencode_Mouse_mm10_Ref/GRCm38.p6.genome.fa-ht-13-2.3.ngm\n'b'[PREPROCESS] Reading from disk took 62.74s\n'b'[PREPROCESS] Max. k-mer frequency set so 1014!\n'b'[INPUT] Input is single end data.\n'b'[INPUT] Opening file /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/XZ-12_S3_R1_001.fastq for reading\n'b'[INPUT] Input is Fastq\n'b'[INPUT] Estimating parameter from data\n'b'[INPUT] Reads found in files: 1710876\n'b'[INPUT] Average read length: 38 (min: 38, max: 40)\n'b'[INPUT] Corridor width: 10\n'b'[INPUT] Average kmer hits pro read: 5.700788\n'b'[INPUT] Max possible kmer hit: 8\n'b'[INPUT] Estimated sensitivity: 0.712598\n'b'[INPUT] Estimating parameter took 7.657s\n'b'[INPUT] Input is Fastq\n'b'[OPENCL] Available platforms: 1\n'b'[OPENCL] AMD Accelerated Parallel Processing\n'b'[OPENCL] Selecting OpenCl platform: AMD Accelerated Parallel Processing\n'b'[OPENCL] Platform: OpenCL 1.2 AMD-APP (1214.3)\n'b'[OPENCL] 1 CPU device found.\n'b'[OPENCL] Device 0: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz (Driver: 1214.3 (sse2))\n'b'[OPENCL] 24 CPU cores available.\n'b'[MAIN] Alignments computed: 1681816\n'b'[MAIN] Done (1667251 reads mapped (97.45%), 43625 reads not mapped, 1710876 lines written)(elapsed: 455.199646s)\n'b'[UPDATE_CHECK] Your version of NGM is more than 6 months old - a newer version may be available. (For performing an automatic check use --update-check)\n'

In my filter folder, I have the filtered bam file but the log file is double the size of the bam file. It starts with this:

No bed-file supplied. Running default filtering on /mnt/dv/wid/projects1/Sridharan-Dot1l/2023SLAM/first_seq/map/XZ-12_S3_R1_001_slamdunk_mapped.bam.

Criterion Filtered reads MQ < 2 491673 ID < 0.95 58552 NM > -1 0 MM 0

t-neumann commented 9 months ago

Hm how did you install slamdunk - in a conda env or are you running a Docker container? Are the bam index files created in the filter folder? And if not, what happens if you run samtools index on the filtered bam files?

Error-fre commented 9 months ago

Sorry I replied late. I thought I figured it out but I have other problems. I originally used pip install. Then I tried to use conda to install it. Now the conda one could finish the pipeline on my test samples that is 300MB. But when I ran it on my actual data which is more than 10GB per sample, it gave me the following error:

slamdunk all Running slamDunk map for 4 files (4 threads) .... Running slamDunk filter for 4 files (4 threads) joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker r = call_item() File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 275, in call return self.fn(*self.args, self.kwargs) File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 620, in call return self.func(*args, *kwargs) File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/joblib/parallel.py", line 288, in call return [func(args, kwargs) File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/joblib/parallel.py", line 288, in return [func(*args, **kwargs) File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 170, in runFilter filter.Filter(bam, outputBAM, getLogFile(outputLOG), bed, mq, minIdentity, maxNM, printOnly, verbose) File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/slamdunk/dunks/filter.py", line 225, in Filter infile = pysam.AlignmentFile(inputBAM, "rb") File "pysam/libcalignmentfile.pyx", line 748, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 958, in pysam.libcalignmentfile.AlignmentFile._open File "pysam/libchtslib.pyx", line 361, in pysam.libchtslib.HTSFile.check_truncation OSError: no BGZF EOF marker; file may be truncated """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/xzhang/miniforge3/envs/slam/bin/slamdunk", line 10, in sys.exit(run()) File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 520, in run runAll(args) File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/slamdunk/slamdunk.py", line 274, in runAll results = Parallel(n_jobs=n, verbose=verbose)(delayed(runFilter)(tid, dunkbufferIn[tid], bed, args.mq, args.identity, args.nm, dunkPath) for tid in range(0, len(samples))) File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/joblib/parallel.py", line 1098, in call self.retrieve() File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/joblib/parallel.py", line 975, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 567, in wrap_future_result return future.result(timeout=timeout) File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/concurrent/futures/_base.py", line 446, in result return self.get_result() File "/mnt/home/xzhang/miniforge3/envs/slam/lib/python3.9/concurrent/futures/_base.py", line 391, in get_result raise self._exception OSError: no BGZF EOF marker; file may be truncated

Thanks,

t-neumann commented 9 months ago

Hm and are you sure you have enough space to fit all of this? Because the test sample of 300MB that successfully ran kind of hints to this being an issue?

Error-fre commented 9 months ago

I tried different sample sizes and input formats several times now. Even with the small sample size, I ran it successfully 2 times, but repeating the same process can also give me the same error later. the corresponding log files either say "skipped mapping". Even with the ones that have a completed mapping process, when I tried to filter it, the log file said "skipped filtering".

t-neumann commented 9 months ago

How much memory does your machine have? Maybe its running OOM - sounds very weird

Error-fre commented 9 months ago

I tried some other solutions. Now I can separately run the command for 3 out of 4 of my big samples. Only one file could not be mapped properly and it is not the largest one.
I ran it on our department server which used a mnt environment and I tried to request 50GB memory and 8GB disk and it still has the same error and quit within 5mins. I subset 1/10th of the fastq file from head and it could run successfully.

t-neumann commented 9 months ago

Hm is there maybe a problem with the fastq file? Is there a way you could send it over so I test it on my end?

It's tough to pinpoint if file size seems not to be the problem and also all the other files work OK

Error-fre commented 9 months ago

I finally got it running by using an interactive session on the server. I think it might still have something to do with allocating resources but I cannot know for sure what happened. Thank you so much for your kind support.