Closed jiadong324 closed 5 months ago
Are there white spaces or special characters in the fasta sequence? Other than that I cannot thing of any other reason for this discrepancy. Please check and remove (if any) such characters. If this does not work then, please share the sequence for me to test.
Please download the sequence at: https://drive.google.com/file/d/1Cqs1MF8x_WEuy9ZYZqiR2L0UQQbyh93c/view?usp=drive_link
I also have the warning and error below:
Reading Coords - WARNING - Reference chromosome haplotype1-0000038 has high fraction of inverted alignments with its homologous chromosome in the query genome (haplotype1-0000031). Ensure that same chromosome-strands are being compared in the two genomes, as different strand can result in unexpected errors.
...
IndexError: list index out of range
According to the existing issues, I first made the dotpot with minimap2 alignment. It has several short inverted repeats highlighted in red.
.
Once I changed to wfmash, most of the short inverted segments are gone. But I still got the same warning and error.
Is the index error caused by these inverted segments?
Are you sure that the red alignments are inverted and the blue ones are directed? I would guess otherwise. Please recheck. Also, I cannot access that fasta file using this link.
You are correct, the blue lines are inverted. I found people do reverse complementary, what is your best suggestion for such case.
Please check the new link https://drive.google.com/file/d/1Cqs1MF8x_WEuy9ZYZqiR2L0UQQbyh93c/view?usp=sharing.
The fasta files are wrong. Please fix them.
10:47 goel@pc-t7-130 netscratch:issue226$ tail hap2_sorange2borange.chr16.fa
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGT
TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTGGTTAGGGTTAGGGTTAGGGTTAG
GGTTAG>chr17_
Hi,
Thanks for developing this nice tool!
Here is my error message for single chromosome comparison:
However, I checked the maximum alignment coordinate, which is smaller than the length of my query sequence.
Then, I added
record_dict = SeqIO.to_dict(SeqIO.parse(qry_fa, "fasta")) print(len(record_dict['chr16_hap2']))
for debug. The query length calculated by my code is the same as the one calculated bysamtools faidx
, which is 13bp longer than the query length calculated by yourreadFasta()
function.Looking forward to your reply!