nhansen / SVanalyzer

Tools for the analysis of structural variation in genomes
http://svanalyzer.readthedocs.io/
Other
76 stars 14 forks source link

issue in vcf format for SVrefine output #20

Closed Milia1368 closed 5 months ago

Milia1368 commented 5 months ago

Hello, I use this tool to call SV from assembly-hap and the commands are listed as below: `

mummer-4.0.0rc1/nucmer -p NA24385_mummer.h1 -t 8 --maxmatch -l 100 -c 500 GRCh38_chromosomes.fa 20200628_HHU_assembly-results_CCS_v12/v12_NA24385_hpg_pbsq2-ccs_1000-pereg.h1-un.racon-p2.fasta gzip NA24385_mummer.h1.delta SVrefine --delta NA24385_mummer.h1.delta.gz –ref_fasta GRCh38_chromosomes.fa –query_fasta 20200628_HHU_assembly-results_CCS_v12/v12_NA24385_hpg_pbsq2-ccs_1000-pereg.h1-un.racon-p2.fasta –outvcf SVanalyzerhp1/ –refname hg38 –samplename NA24385.h1 –maxsize 100000 `

All was well done, but there are some formatting issues in the formed vcf file which I can't figure out : why are all SVLEN=4 or 0,and GT =1 or 0/1 , what does this meaning? Under what circumstances would this be the case? Could you give me some advice about this, so I can make it for subsequent evaluation?

image

Thanks!

nhansen commented 5 months ago

Thanks for bringing this to my attention! There was a bug with incorrect size assignment when no sequences are included (the "4" and "0" are the difference between the non-sequence ALT seq "\<DEL>" or "" and the REF seq "N", but that has now been fixed). I'm not sure why the program was unable to retrieve the REF/ALT alleles from the two fasta files you specified with query_fasta and ref_fasta options, though. Did it report any errors about that when running (probably to STDERR)? Are the files GRCh38_chromosomes.fa and v12_NA24385_hpg_pbsq2-ccs_1000-pereg.h1-un.racon-p2.fasta properly-formatted fasta files?

For the genotypes, SVrefine only reports as many alleles as there are query entries aligned to the reference. In the case of single-allele genotypes (e.g., "1"), only one contig from your assembly aligned to that specific location. Hope that helps!

Also, if you can give the most recently-committed version of SVrefine a try and see if you get correct SV lengths for this example, that would be a great help. Thank you!

Milia1368 commented 5 months ago

Thanks for your help! I tried it again and the SVLEN problem is solved. Here is the output:

5c8c750f9e9fc6914a8939c31819d0f

My confusion has been solved, thanks a lot!