liguowang / CrossMap

CrossMap is a python program to lift over genome coordinates from one genome version to another.
https://crossmap.readthedocs.io/en/latest/
Other
64 stars 23 forks source link

AttributeError: 'NoneType' object has no attribute 'replace' #58

Open hermes1130 opened 1 year ago

hermes1130 commented 1 year ago

Hi,

I've been using your CrossMap for a while to convert the gene coordinates of chimpanzee and orangutan to human's one, thanks to you! It's been working so far very great, but I have encountered an issue titled in this issue.

Crossmap.py version: 0.6.5 My input: chimpanzee and orangutan bam files generated from ChIP-seq My commend line: CrossMap.py bam -a panTro6ToHg38.over.chain.gz Chimp.bam HumanizedChimp Log:

True
Insert size = 200.000000
Insert size stdev = 30.000000
Number of stdev from the mean = 3.000000
Add tags to each alignment = True
2023-03-25 11:05:59 [INFO]  Read the chain file "panTro6ToHg38.over.chain.gz" 
2023-03-25 11:06:26 [INFO]  Liftover BAM file "Chimp.bam" to "HumanizedChimp.bam"
Traceback (most recent call last):
  File "/Users/je.lee/opt/anaconda3/envs/CrossMap/bin/CrossMap.py", line 281, in <module>
    crossmap_bam_file(mapping = mapTree, chainfile = chain_file, infile = in_file, outfile_prefix = out_file, chrom_size = targetChromSizes, IS_size = args.insert_size, IS_std = args.insert_size_stdev, fold = args.insert_size_fold, addtag = args.add_tags, cstyle = args.cstyle)
  File "/Users/je.lee/opt/anaconda3/envs/CrossMap/lib/python3.7/site-packages/cmmodule/mapbam.py", line 546, in crossmap_bam_file
    new_alignment.query_sequence = revcomp_DNA(old_alignment.query_sequence)        #reverse complement read sequence
  File "/Users/je.lee/opt/anaconda3/envs/CrossMap/lib/python3.7/site-packages/cmmodule/utils.py", line 92, in revcomp_DNA
    seq = dna.replace(' ','').upper()
AttributeError: 'NoneType' object has no attribute 'replace' 

This is the head of my input bam file:

HWI-H225:248:C3JPEACXX:6:1308:1935:47172    256 chr1    103 0   51M *   0   0   *   *   MD:Z:51 PG:Z:MarkDuplicates NM:i:0  AS:i:51
HWI-H225:248:C3JPEACXX:6:1205:9756:37551    0   chr1    254 60  51M *   0   0   CCAGCACGAGGCCAAGCCAGTGAGAGCTCAGAGACAGCATGGGTGGAAGGG CCCFFFFFDHHGHJJIJJGIFIIIIJJJJEGIIJCGIJJJJJJGHIIE7DH MD:Z:51 PG:Z:MarkDuplicates NM:i:0  AS:i:51 XS:i:19
HWI-H225:248:C3JPEACXX:6:1207:18795:41438   16  chr1    378 60  51M *   0   0   CATTTCCGGAAGATCTGCAGGGACTGCCCAGCGTGCAGCATTCCTGGCGTG CGFGIGGHEHF@DFBGCGGHDC1GIHCBIIGGHDAE>BDBDHBFDBDF@@@ MD:Z:51 PG:Z:MarkDuplicates NM:i:0  AS:i:51 XS:i:0
HWI-H225:248:C3JPEACXX:6:1204:12561:24679   0   chr1    1042    60  51M *   0   0   AGCGGCCCCCCAGGACAGCAGCAAGCAGGGCCAAGATGCCACCGCTACGCT @@@DFFFFHHHHGJJIIJIJIFGJIFGGIJGIJJGIGHGIIJHHHFFDCC@ MD:Z:51 PG:Z:MarkDuplicates NM:i:0  AS:i:51 XS:i:20
HWI-H225:248:C3JPEACXX:6:1304:15831:40296   272 chr1    1441    0   51M *   0   0   *   *   MD:Z:50C0   PG:Z:MarkDuplicates NM:i:1  AS:i:50
HWI-H225:248:C3JPEACXX:6:1304:15831:40296   16  chr1    1463    1   51M *   0   0   GGGAGCCGCATGAGAGACAGAAGGGAGCCGCATGAGAGACAGAAGGGAGCT JJJJIJJJJJIJJJIJIIIGIJJJJIJJJJJJJJJJJJHHHHHDFFFFCC@ MD:Z:51 PG:Z:MarkDuplicates NM:i:0  AS:i:51 XS:i:50
HWI-H225:248:C3JPEACXX:6:1104:7957:90670    0   chr1    1600    60  51M *   0   0   GTTTTCCTCCTCAATGCTGAGCAAATCTTCCTCCCTCCCTGCCTGAAAATG CBCFFFFFHHHHHJJJJIJIJIJIJJJJJJJJJJJJJIJJJJJJIHIJIHI MD:Z:51 PG:Z:MarkDuplicates NM:i:0  AS:i:51 XS:i:21
HWI-H225:248:C3JPEACXX:6:1206:11110:92290   0   chr1    1648    60  51M *   0   0   ATGCAGTACCCCCCACCCTGAGACCCTGACCCATGCCAAGGGCAGCCAAGC CCCFFFFFHFHHHJJJJIIIIJIJJJJJIJJJJIIIJJJIGHIIJGIGGGH MD:Z:51 PG:Z:MarkDuplicates NM:i:0  AS:i:51 XS:i:20
HWI-H225:248:C3JPEACXX:6:1311:5480:97902    0   chr1    1965    60  51M *   0   0   CAAGCTGGACCCCAGTACCACGCCCAGCCGCCTTCCTAGGTCACTCTGGCT BCCFFDFFHFHHGJJIIGHHJGGIGGGGGGFIIEHJJJJGHIEIEHHJGIG MD:Z:51 PG:Z:MarkDuplicates NM:i:0  AS:i:51 XS:i:19
HWI-H225:248:C3JPEACXX:6:1109:12675:57588   0   chr1    2188    60  48M *   0   0   ATTTGGGCCCAGAACGAAGGGGGCTCTCCAGGCCGCACAGGACATGGG    CCCFFFFFHHHHHJJJJJJJJJJJJJIIJJJJJIIJJJHHHHFFFFFD    MD:Z:48 PG:Z:MarkDuplicates NM:i:0  AS:i:48 XS:i:0

After executing the CrossMap commend line, I indeed get a "croassmapped" bam file but it's done only partly, e.g. the output file contains only chr1.

Could you please help me out? In case you need more information, I can provide the input/output files.

Thank you in advance!

Dafnaa commented 12 months ago

I have the same error. Did you found a solution yet?

liguowang commented 12 months ago

When the sequence is represented as "*", pysam AlignedSegment.query_sequence will return None (link). Could you please remove those alignments records and try again? if this is indeed the case, I can quickly fix this issue.

thanks

Liguo