medvedevgroup / SibeliaZ

A fast whole-genome aligner based on de Bruijn graphs
http://medvedevgroup.com/
Other
141 stars 19 forks source link

Inaccurate coordinates in MAF file #27

Open smallfade opened 3 years ago

smallfade commented 3 years ago

After running SibeliaZ between two fasta files, I found the coordinate is different from the original fasta file, and the sequence did not match back to the reference or query file. Both input files contain hard-masking regions. Did the output remove the hard-masking region length from original position? How could I get the same coordiantes as in the original file? Otherwise, it would be difficult for the postprocessing, when the reference is an established genome with annotations.

iminkin commented 3 years ago

Hi, that is very interesting. Could you please share the input and the parameters you used?

smallfade commented 3 years ago

Hi, that is very interesting. Could you please share the input and the parameters you used?

I did not change the default parameters. sibeliaz -t 40 -o out_dir GRCm38.primary_assembly.genome.fa xxxx_strain_assembly.fa

I'm sorry, it may not be a problem with SibeliaZ. The output I checked is from mafFilter, which might be the culprit here. The original MAF from SibeliaZ does not have the "inaccurate" coordiantes. But I wonder how to process the output from SibeliaZ to get VCF file for counting the differences. Thanks!