uzh-dqbm-cmi / fragmentstein

Create a BAM file out of insensitive fragments data (i.e. FinaleDB frag.tsv.bgz file) including sequences extracted from a reference genome.
GNU General Public License v3.0
2 stars 0 forks source link

[E::bgzf_flush] File write failed (wrong size) terminate called after throwing an instance of 'std::runtime_error' what(): can't write alignment record Aborted (core dumped) #1

Open brmprnk opened 9 months ago

brmprnk commented 9 months ago

Dear Fragmentstein team,

Thanks for the useful tool! I feel like a lot of people could really use your tool, hopefully FinaleDB could have a link to your tool for more recognition. However, I have now run ~300 samples using your tool (Cristiano et al study, different ctypes), and I consistently the following crash message.

Log:

terminal$ $ fragmentstein -i EE87876.hg38.frag.tsv.bgz -g ../../Desktop/GRCh38_full_analysis_set_plus_decoy_hla.fa -c ../../Desktop/hg38.chrom.sizes -s -t 8 -r 100 -o EE87876.bam

cmd: bash /home/ubartu/miniforge3/envs/fragmentstein/lib/python3.10/site-packages/scripts/fragmentstein.sh -i EE87876.hg38.frag.tsv.bgz -g ../../Desktop/GRCh38_full_analysis_set_plus_decoy_hla.fa -c ../../Desktop/hg38.chrom.sizes -s -t 8 -r 100 -o EE87876.bam

Unzipping, converting FinaleDB file EE87876.hg38.frag.tsv.bgz and accepting mapping quality>30 to BED file format ...
Creating paired reads and add fragments length into a temp BED file EE87876.bam.tmp.bed ...
Create intermediate SAM file EE87876.bam.tmp.sam from EE87876.bam.tmp.bed for chromosome_sizes:../../Desktop/hg38.chrom.sizes and default map quality: 60 ...

[E::bgzf_flush] File write failed (wrong size)
terminate called after throwing an instance of 'std::runtime_error'
  what():  can't write alignment record
Aborted (core dumped)

Get sequences from reference fasta ../../Desktop/GRCh38_full_analysis_set_plus_decoy_hla.fa with read length 100 for intermediate files EE87876.bam.tmp.bed and EE87876.bam.tmp.sam ... 
if read fwd->flag=99, if read rev -> flag=147, set mate start, set mate chr to same as this read and set base quality to F ... 
[bam_sort_core] merging from 2 files and 8 in-memory blocks...
Finished EE87876.bam

The result of the program is that in about ~30% of files, the coverage profile is either completely or partially missing for large sections of the genome. Debugging the bgzf flush command has not lead to meaningful results, but most likely it is related to a mismatch in file size allocation for the tmp files, could that be true? I hope you can look in to this as this has happened for every finaleDB file thus far and requires manual inspection for each created .bam file.

Thanks and all the best.

togop commented 9 months ago

Thank you for reporting that. Which versions of samtools, bedtools, awk, and gunzip are you using?

togop commented 9 months ago

Another suspicion is that you run out of memory, as, unfortunately, the bedToBam doesn't support multithreading.