pangenome / vcfbub

use variant nesting information to flter overlapping sites from vg deconstruct output
MIT License
23 stars 1 forks source link

The vcfhub remove about 300Mb data in a 2.5Gb genome #7

Open ld9866 opened 7 months ago

ld9866 commented 7 months ago

Dear developer: We used Minigraph-Cactus to build a pan-genome and used Pangenie for individual typing. We found a large fragment of variation in some chromosomes in the genome group, which was lost after quality control. What caused this? Will it affect the subsequent analysis? Because we were trying to do a genome-wide association analysis of SV, we were puzzled by the lack of information in some chromosome fragments all the types: SNP, Indel, and SV. Best day!

Code: vcfbub -l 0 -r 100000 --input chr2.vcf.gz > chr2.ready.vcf

Example: 2 10510824 >18339279>18339282 GT AG 60.0 . GT 0 0 0 0 0 0 1 0 0 0 > 2 10510828 >18339282>18339285 CA TG 60.0 . GT 0 0 0 0 0 0 1 0 0 0 > 2 10510833 >18339285>18339288 CA TT 60.0 . GT 0 0 0 0 0 0 1 0 0 0 > 2 10510838 >18339288>18339291 CT TC 60.0 . GT 0 0 0 0 0 0 1 0 0 0 > 2 10510842 >18339291>18339294 GC TT 60.0 . GT 0 0 0 0 0 0 1 0 0 0 > 2 10510848 >18339294>18339297 C G 60.0 . GT 0 0 0 0 0 0 1 0 0 0 > 2 10510856 >18339297>18339300 GG CT 60.0 . GT 0 0 0 0 0 0 1 0 0 0 > 2 10510865 >18339300>18339303 A G 60.0 . GT 0 0 0 0 0 0 1 0 0 0 > 2 10510867 >18339303>18339305 TAAC T 60.0 . GT 0 0 0 0 0 0 1 0 0 0 0 > 2 10510886 >18339305>18339308 A G 60.0 . GT 0 0 0 0 0 0 1 0 0 0 > 2 10510889 >18339308>18339311 TACC ATTG 60.0 . GT 0 0 0 0 0 0 1 0 0 0 > 2 79046966 >21226444>21226446 A AACGAATCCGACTAGGAACCATGAGGTTGCAGGTTCGGTCCCTGCCCTTGCTCAGTGGGTTAACGATCCGGCGTTGCCGTGAGCTGTGGTGTAGATCACAGATGCAGCTTAGATCCTGAGTTGCTGTGGCTGTGGCATATGGTGGCAGCTGCTATCTGATTCGACCCCTAGACTGGGAACCTCCATATACCACGAGTGCAGTCCTA> 2 79047027 >21226446>21226449 A G 60.0 . GT 0 0 0 0 0 0 0 0 0 0 > 2 79047042 >21226449>21226452 CC CT,CCT 60.0 . GT 0 0 0 0 > 2 79047076 >21226452>21226455 A G 60.0 . GT 1 0 1 1 0 1 0 0 1 1 > 2 79047081 >21226455>21226457 CC C 60.0 . GT 0 0 0 0 0 0 0 0 0 0 0 > 2 79047086 >21226457>21226460 T C 60.0 . GT 0 0 0 0 0 0 0 0 0 0 > 2 79047149 >21226460>21226463 CC CAT 60.0 . GT 0 0 0 0 0 0 0 0 0 0 > 2 79047156 >21226463>21226466 CCG CA 60.0 . GT 0 0 0 0 0 0 0 0 0 0 > 2 79047160 >21226466>21226469 G A 60.0 . GT 0 0 0 0 0 0 0 0 0 0 > 2 79047166 >21226469>21226471 GG G 60.0 . GT 0 0 0 0 0 0 0 0 0 0 0 >