schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
305 stars 36 forks source link

Synteny differs between same set of assemblies #173

Closed genmor closed 1 year ago

genmor commented 1 year ago

Hi,

I'm assessing synteny between a reference genome (chromosome-level) and three assemblies. Using SyRi and plotsr, I was able generate some neat synteny plots, but I noticed something odd—using a different order of genomes leads to different outcomes. That is, if I have my reference and three assemblies (A, B, and C) and run SyRi + plotsr, the synteny of one order (ref + A; A + B; B + C) differs from a different one (ref + C; C + A; A + B). This might be confusing so I've attached both plots here.

Nmex03_synteny

Nmex03_syteny2

Is this expected behavior? I would appreciate any insight on this.

Just to add some detail in case it helps: the assemblies are of the same individual output from three different assemblers. The reads we are working with are HiFi. The assemblies have been purged and scaffolded and those scaffolds to chromosomes (using my reference genome).

mnshgl0110 commented 1 year ago

Hi. The annotations for chromsomes 1 and 3 looks correct. The little variation in translocations and duplications calling is expected. The differences in chromosome 2 are unexpected and weird, but I am not sure what could have caused this. I would suggest to check the alignments (generate a dotplot) and syri's output manually to check how this region is described. Probably that would already clarify the source of error.

genmor commented 1 year ago

Hi again. I appreciate your quick response! I isolated Chr 2 for each of my assemblies and my reference, aligned each assembly to the reference using minimap -x asm5 to create paf files, and produced dotplots as you mentioned.

image

There doesn't appear to be any inversions based on the dotplots. I also isolated what I think is the offending inverted region from the vcf file and it appears there are multiple annotations always paired with regions.

Can you provide some insight on how to proceed?

mnshgl0110 commented 1 year ago

Based on these alignments, the hifiasm chromosome 2 should be syntenic as well. Can you please share the structural annotations by syri. You can get them using grep -P '(SYN|INV|TRANS|INVTR|DUP|INVDP|NOTAL)\t' syri.out

Also, try running syri with -f to disable automatic alignment filtering.

genmor commented 1 year ago

Hi Manish,

I appreciate you quick turnaround on addressing my issues, and I apologize for my delay in updating you about this. As you suggested, I've used grep to pull the annotations. You can find them below: check.txt

I also re-ran SyRi on just my hifiasm assembly with the -f flag. ref-hifiasm_test

mnshgl0110 commented 1 year ago

In the check.txt file, the 100-150MB region in Chr2 is mostly NOTAL. But the dotplot suggests that these regions are aligned. Is this file from the syri run using -f? Other than this filtering, I cannot think of any other reason for having these many large gaps.