tjiangHIT / cuteSV

Long read based human genomic structural variation detection with cuteSV
MIT License
237 stars 33 forks source link

BND representation #102

Open vmukhina opened 1 year ago

vmukhina commented 1 year ago

Hi @tjiangHIT, I used cuteSV to detect translocations in my sample and it seems that the output format differ from declared vcf4.2. According to the vcf specification https://samtools.github.io/hts-specs/VCFv4.2.pdf p5.4 each translocation event is represented by two (or more) mate entries with MATEID crosslinks in INFO field. Here is an example from the specification:

CHROM POS ID REF ALT QUAL FILTER INFO

1 1 bnd_Y T ]13 : 123456]T 6 PASS SVTYPE=BND;MATEID=bnd_U 13 123456 bnd_U C C[1 : 1[ 6 PASS SVTYPE=BND;MATEID=bnd_Y

However, none of my BND entries obtained by cuteSV has clear mate entry and none has MATEID flag chr1 178961228 cuteSV.BND.8 N ]chr4:139622129]N . PASS PRECISE;SVTYPE=BND;RE=4;RNAMES=NULL GT:DR:DV:PL:GQ ./.:.:4:.,.,.:. As a result, some post-processing tools threat these entries as single BNDs. Thanks for checking!

PS. Btw same problem is present in sniffles outputs, may be you will find this thread helpful (https://github.com/fritzsedlazeck/Sniffles/issues/121 ).

Meltpinkg commented 1 year ago

Hi @vmukhina

Sorry for the late reply. For translocations, cuteSV reports only one record with the smaller chromosome for a translocation. The form of translocation is shown in the alternative field, and this single record can completely represent the information of a translocation. For the example mentioned above, cuteSV only reports the first line (because 1 < 13). And the MATEID flag and the information in the second line can be inferred from the first line. In practice, you can write further scripts to add these information for translocations.

Best regards, Shuqi