hewm2008 / LDBlockShow

LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on VCF files
MIT License
18 stars 2 forks source link

Discrepancies in R2 and LD compared to bcftools prune #3

Open Knight-JChev opened 1 week ago

Knight-JChev commented 1 week ago

Dear all,

So I've tried using LDBlockShow on a QTL dataset to visualize the linkage blocks between biallelic SNPs. I tested to observe the blocks before and after pruning to understand how the pruning took place and what it merged.

So I compared the R² and D' value obtained from bcftools +prune : vcf_test.txt vcf_test_annotated.txt

bcftools +prune -w 1 -a r2,LD vcf_test.vcf -O v -o vcf_test_annotated.vcf

and from LDBlockShow : LDBlockShow -InVCF vcf_test.vcf -OutPut test_noprun_2Mb -SeleVar 2 -Region Chr01:1:2000000 ShowLDSVG -InPreFix test_noprun_2Mb -OutPut test_noprun_2Mb_Rsq -ShowNum

Looking at the results I found discrepancies in the value of R2 and D' at some (not all) positions between LDBlockShow and bcftools prune annotation. Scores are shown for one marker with the previous one after annotation in bcftools prune.

I highlighted in blue the place that seemed weird to me :

bcftools_prune_example

LDBlockShow_example

It does kind of the same at the same locations with D'. Could you explain me what happens there ? Is that supposed to occur ? I read that the computations are the same between LDBlockShow and other tools, which seems to be the case here but I wonder why there are such differences at those places.

PS : Also LDBlockShow does not show the blocks with a black outline compared to what's in the documentation.

Kind regards, Knight

hewm2008 commented 5 days ago

1 Please make sure that there is no problem with the program installation, especially the plink file in bin should be placed in the same directory as LDBlockShow 2 No error is reported during operation running 3 The following command can produce normal images 4 Compare with other software. I don’t know how you compare. I think there is no problem with my program.

LDBlockShow -InVCF  vcf_test.txt    -OutPut test_noprun_2Mb -SeleVar    2   -Region Chr01:1:2000000 -NoShowLDist    100000000
 zcat   test_noprun_2Mb.site.gz |  awk '{print $0, $2}' >  Site.info
ShowLDSVG -InPreFix test_noprun_2Mb -OutPut test_noprun_2Mb_Rsq -ShowNum  -NoShowLDist   100000000  -SpeSNPName   Site.info

test_noprun_2Mb_Rsq

Knight-JChev commented 5 days ago

First of all, thanks for your answer.

  1. That is the case, LDBlockShow, plink and ShowLDSVG script files are in the same bin directory.
  2. This might be the case, I'm getting a segmentation fault error which relates in plink not being able to make a specific .blocks.det file I believe. Still the program is able to make the map, even though I suppose this is the reason why the black outline is not showing.

##Start Region Cal... :Chr01 1 2000000; In This Region TotalSNP Number is 15 find blocks... Segmentation fault (core dumped) mv: cannot stat '/mnt/f/QTL/Variant_calling_pipeline/results/results/vcf/pruned/test_noprun_2Mb_return.plink.blocks.det': No such file or directory gzip: /mnt/f/QTL/Variant_calling_pipeline/results/results/vcf/pruned/test_noprun_2Mb_return.blocks: No such file or directory Warining: SVG module in Perl is missing, trying to loading the built-in [SVG.pm]... Loading SVG module done Start draw... SVG info: SNPNumber :15 , SVG (width,height) = (862.5,637.5) ALL done

  1. Indeed it worked except for the black outline and it gives me the same map as I sent in the previous message image

  2. I will definitely try to compare with other software but I was inclined to believe bcftools informations. The command line related to bcftools +prune (see previous message) writes R2 and D' score for a variant compared to another. See the info field with POS_R2 being the variant compared to the variant of the line (which should always be the previous variant) and R2 being the R2 score between those two variants. E.g of the line highlighted in blue : variant at position 272 661 is being compared to variant at position 272 658, the R2 score resulting of this comparison is 1; as shown in the graph made with LDBlockShow. The issue to me is the R2 between 164 353 and 164 773 : bcftools output tells me it is 0.95 whereas LDBlockShow tells me it is 0.03. When I take a look at the vcf file I'm rather inclined to believe bcftools. image

Warm regards, Knight

hewm2008 commented 4 days ago

1 should be the plink 1.9 ,you can rm the other plinks ,only keep the LDBlockshow/plinks Segmentation fault (core dumped) mv: cannot stat '/mnt/f/QTL/Variant_calling_pipeline/results/results/vcf/pruned/test_noprun_2Mb_return.plink.blocks.det': No such file or directory gzip: /mnt/f/QTL/Variant_calling_pipeline/results/results/vcf/pruned/test_noprun_2Mb_return.blocks: No such file or directory

2 you can try phase the vcf first. try the phase vcf