schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
323 stars 35 forks source link

Confusion with Position of large insertion and deletion in vcf output from syri #128

Closed Dkyuan closed 2 years ago

Dkyuan commented 2 years ago

Hi, sir:

When I was checking the large InDels (PAV) in the vcf files, I was confused with the positions of the variations. [I check the position because I want to extract the PAV sequences] As some examples below: Positions of the sencond and fourth insersions [ID "INS2" and "INS13"] are what I expected. The start position and End position on the ref. genome are equal to each other, and StartB is less than EndB. So, can someone help me understand what happend when:

  1. As "INS1", the End position (42294) is larger than the start position (42293);
  2. As "INS8", "INS14", and "INS15", the End positions are smaller than start positions.

Similar for deletions, "DEL18" is the same as my expectation: the start position (405833) is smaller than the End position (406191), and the StartB is equal to EndB (427915) ; but what happend: as "DEL17", the StartB (419797) is greater than EndB (419795)

  1. 1 42293 INS1 N . PASS END=42294;ChrB=Chr01;StartB=79706;EndB=79890;
  2. 1 56366 INS2 N . PASS END=56366;ChrB=Chr01;StartB=80877038;EndB=80877143;
  3. 1 217192 INS8 N . PASS END=217190;ChrB=Chr01;StartB=243255;EndB=243596;
  4. 1 260808 INS13 N . PASS END=260808;ChrB=Chr01;StartB=287770;EndB=287863;
  5. 1 264818 INS14 N . PASS END=264817;ChrB=Chr01;StartB=291839;EndB=292070;
  6. 1 373677 INS15 N . PASS END=373673;ChrB=Chr01;StartB=395881;EndB=396288;
  7. 1 397489 DEL17 N . PASS END=397731;ChrB=Chr01;StartB=419797;EndB=419795;
  8. 1 405833 DEL18 N . PASS END=406191;ChrB=Chr01;StartB=427915;EndB=427915;

Thanks for your help ~

Xuan.

mnshgl0110 commented 2 years ago

image The arrangement of end-points of consecutive alignments in an annotation block (grey alignments) is analysed to find structural variations.

When identifying indels between two neighboring alignments, syri allows some overlap between the alignments which in turn results in this behaviour. This is controlled by the --allowed-offset parameter.

This overlap, in turn can result in such start and end positions. For practical usage, using the start position should be fine.

Dkyuan commented 2 years ago

image The arrangement of end-points of consecutive alignments in an annotation block (grey alignments) is analysed to find structural variations.

When identifying indels between two neighboring alignments, syri allows some overlap between the alignments which in turn results in this behaviour. This is controlled by the --allowed-offset parameter.

This overlap, in turn can result in such start and end positions. For practical usage, using the start position should be fine.

Thanks very much for your help.

Now I noted that the default BPs allowed to overlap is 5bp, to avoid such start and end positions, I should set parameter --allow-offset to OFFSET, is that right ? Now I will use the start position to move forward.

Thanks again !

mnshgl0110 commented 2 years ago

I should set parameter --allow-offset to OFFSET, is that right ?

No. You need to set --allow-offset to an integer value. To avoid such start/end positions use --allow-offset 0.

However, when two alignments have more than OFFSET base-pairs overlapping, they are annotated as copyloss/copygain (check supplementary figure S8 of syri paper). So, if OFFSET=0, then even a 1BP overlap between alignments would result in copy-change. Generally, this is not desired, and therefore OFFSET value of 5 helps in restricting false copy-change calls. You can try different values to adjust it as per your requirements.

Dkyuan commented 2 years ago

@mnshgl0110

I got it. Thank you very much.