medvedevgroup / SibeliaZ

A fast whole-genome aligner based on de Bruijn graphs
http://medvedevgroup.com/
Other
141 stars 19 forks source link

Unexpected output #45

Open jjoets opened 2 years ago

jjoets commented 2 years ago

Hi,

Thanks a lot for this huge tool. I had a very good experience with it a couple year ago (version 1). I now have other runs to do and I updated to the last version. I was a bit surprised by the outpout : instead of getting one lcb per genome for a unique block ID, I get several ones. To try to understand what was wrong, I ran again my previous analysis with the same setting but instead of getting on average one block coordinate by genome in the blocks_coords.gff I now get many.

Is this change expected between V1 and current version ? Is this linked to the -a option ? What would be you advices ?

Thanks a lot,

Johann

iminkin commented 2 years ago

Hi,

I am glad that you found SibeliaZ helpful. This is a bit tricky question. Representation of whole-genome alignment is inherently ambiguous. For example, a single alignment block can be chopped into multiple consecutive blocks and still represent an equivalent alignment. Also, the newer version could generate more blocks due to improvements in sensitivity, which would also result in fragmenting the blocks, which I suspect is the case. Does the coverage change of the block change significantly between the runs of different versions?

Regarding to -a option, it depends on how many genomes are there in the dataset, as well as their repeat content. What kind of genomes (and how many) are you trying to align?