sandberg-lab / dataprivacy

GNU General Public License v3.0
14 stars 4 forks source link

TypeError when running BAMboozle on bam files outputted by Salmon #6

Open matmu opened 12 months ago

matmu commented 12 months ago

When I run BAMboozle 0.5.0 on BAM files generated with Salmon 1.6.0, I get the following error message:

BAMboozle --bam sample01.bam --out sample01.anonym.bam --fa /mnt/data/reference_files/gencode_v40/gencode.v40.transcripts.modified.fa

BAMboozle.py v0.5.0 
Working... 
multiprocessing.pool.RemoteTraceback: 
""" Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) 
  File "/home/lib/python3.10/site-packages/BAMboozle/BAMboozle.py", line 206, in clean_bam
    if len(final_outseq) != len(qual): #the sanitized output sequence cannot be longer than a contig (reason:deletions) 
TypeError: object of type 'NoneType' has no len() 
""" 

The above exception was the direct cause of the following exception: 

Traceback (most recent call last):
  File "/home/bin/BAMboozle", line 8, in <module> sys.exit(main()) 
  File "/home/lib/python3.10/site-packages/BAMboozle/BAMboozle.py", line 286, in main
    x = [r.get() for r in results] 
  File "/home/lib/python3.10/site-packages/BAMboozle/BAMboozle.py", line 286, in <listcomp> x 
    = [r.get() for r in results]
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value
TypeError: object of type 'NoneType' has no len()

Not sure if this a bug or if is BAMboozle not applicable to BAM files created by Salmon. Thanks a lot for your help.

matmu commented 12 months ago

I used the genome instead of the transcriptome reference. My fault.

matmu commented 11 months ago

The error persists the transcriptome reference.

cziegenhain commented 11 months ago

Hi,

Sorry for the slow reply! I have not validated or intended use of BAMboozle for alignments to transcriptome, sounds like there is an issue with reads sitting on the ends of transcript potentially with some clipping or indel resolving that fails. I'm assuming this is some type of RNA-seq samples, could you consider using a spliced genome aligner such as STAR?