pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
774 stars 274 forks source link

squigualiser reform: object of type 'NoneType' has no len() #1228

Closed Marjan-Hosseini closed 11 months ago

Marjan-Hosseini commented 11 months ago

I have a bam file basecalled using dorado with the following:

dorado basecaller /dorado_models/dna_r9.4.1_e8_sup@v3.3/ /pod5_files/ --reference GCF_009914755.1_T2T-CHM13v2.0_genomic.fna --recursive --emit-moves --modified-bases 5mCG_5hmCG > aligned.bam
samtools sort aligned.bam -o sorted.bam

Then I am trying to re-squiggle to change the move table produced by dorado, since dorado gives a fix stride = 5. To be able to use squigualiser command, first I have to get a .paf file which is needed for that purpose:

squigualiser reform --sig_move_offset 0 -c -b sorted.bam -o reform_output.paf

It gives me the following error: reform.py", line 77, in run len_seq = len(sam_record.get_forward_sequence()) - kmer_length + 1 # to get the number of kmers TypeError: object of type 'NoneType' has no len()

I even tried base call using slow5-dorado, but it no reads are basecalled:

Samples/s: 0.000000e+00

jmarshall commented 11 months ago

When get_forward_sequence() returns None, it is because the SAM/BAM record omits the sequence (so e.g. in SAM the SEQ field is *). It would appear that sorted.bam contains such records but this reform.py code does not handle them.

You should probably report this as a squigualiser issue.