Open baozg opened 1 week ago
Hi Zhigui,
nice to see msyd didn't crash on so many genomes :D how long did it run BTW? The duplication of CORESYN425-454 seems like a bug. I ran into some similar issues running the realignment algorithm when minimap2 wasn't finding any alignments, causing duplicate regions to be inserted but never had that occur with coresynteny. Do you have the logs or a sample so I can replicate it & try to find where the bug is happening?
Weird that msyd seems to find quite some coresynteny in Chr5 but little in Chr1. How does it look in the other chrs? Absence of coresyn in a region can be caused by individual large structural variants/misassemblies (we saw this e.g. with the Sha inversion on Chr3 in the AMPRIL population). It might be worth plotting Chr1 using plotsr to see if any samples are highly divergent/missassembled.
More generally, SyRI/msyd does find synteny even with indels/snps in them (e.g. CORESYN456 in the output above) – in our experience, playing around with minimap2 parameters can help to find somewhat more consistent synteny. You could also try changing the INDEL threshold in SyRI here, though it has been working fine for us with default threshold so far. The threshold msyd uses during the synteny intersection step is defined here as 30 bp (not exposed in the CLI right now), you can also try to play around with that.
LMK if that helps!
I confirmed with miniprot alignment that at least Chr1 could find many shared core genes. So I think some weird thing happened. It only finds 11Mb sequences across 5 chromosomes. If you take genomes from NCBI public genomes, may could be enough to catch this. I could share you with a link
Hi, @lrauschning
I tried
msyd
with 279 A.thaliana genomes, but the core synteny region is small. For example, Chr1 only has several regions beginning only in the Chr1:1-200000 and Chr1:2700000-32000000. Would it be possible to lower the threshold? Like any region have SNPs and InDels less than 50 bp still could be coresynteny?By the way, why msyd generate lots of duplicated region with different synID?