lyl8086 / VCF_filter

A multi-threading versatile filter for VCF file
1 stars 0 forks source link

Filtering VCF from Stacks pipeline #1

Closed konopinski closed 10 months ago

konopinski commented 10 months ago

Hi, Multithread vcf filter is a great idea! Thanks for your work. I have encountered a problem while trying it. I have a vcf file produced by Stacks pipeline. I tried to filter it with VCF_filter but I received an error saying: Thread # terminated abnormally: DP field is requred. I'm sure there is such field in the vcf file. Do you have any idea what could it mean? Below you will find a header and a few lines of genotypes.

issue.zip

Cheers, Maciek

lyl8086 commented 10 months ago

Hi Maciek,

This is because Stacks will eliminate all the genotyping information when GT is missing, I have updated a new version to handle this issue, please have a try.

Best, Yulong

konopinski commented 10 months ago

Thank you, Yulong. DP problem is fixed. But I have two new ones.

  1. When I set quality filter the script produces an error "No Quality value in VCF lines." The command was: VCF_filter_multi.pl --i populations.snps.vcf --o multi.filtered.1.vcf --P ../../popmap2023.txt --m 10 --M 200 --Q 20 --g 0.01 --f --T 21
  2. When I try to filter again the vcf produced by your script there's another problem:
    
    ../VCF_filter/VCF_filter_multi.pl --i multi.filtered.vcf --o multi.filtered.1.vcf --P ../../popmap2023.txt --m 10 --M 200 --Q 20 --g 0.01 --f --T 21

================================================================================= Using 21 threads, 999 per batch...

CMD: /media/sf_Dane/RADSeq/Barbus/new/R/allDataStacks/../VCF_filter/VCF_filter_multi.pl --i multi.filtered.vcf --o multi.filtered.1.vcf --P ../../popmap2023.txt --m 10 --M 200 --Q 20 --g 0.01 --f --T 21

Parameters used: File name : multi.filtered.vcf Out name : multi.filtered.1.vcf MinQ : 20
minDepth : 10
MaxDP : 200
Global MAF : 0.01
Only keep polymorphic sites Only keep bi-allelic sites Remove Indels

done. Use of uninitialized value $tot_indiv in concatenation (.) or string at ../VCF_filter/VCF_filter_multi.pl line 255. =================================================================================Use of uninitialized value $tot_indiv in concatenation (.) or string at ../VCF_filter/VCF_filter_multi.pl line 258.

Final retained : 0 SNPs of individuals Total 0 SNPs


Sorry for troubling you - I think you did a great job writing this script. You might consider writing a short 'User manual' because not all options are intuitive, e.g. it is not certain if `--H`,`--c` or `--l`  filters out the whole snp if it exceeds the provided value in a single population, or how Fis filtering works. I guess you wrote it because you needed it for some particular project and it is great you want to share it with the world, but such brief explanation would be very helpful (consider also writing a short note somewhere so that the package is easier to cite). Thanks a lot.
Maciek
lyl8086 commented 10 months ago

Hi Maciek,

Stacks do not output SNP quality in VCF, so do not set filter on MinQ. I guess your popmap file is not the correct file used for VCF_filter. I have uploaded the sample files, please have a look. I will update a User manual in the near future.

Best, Yulong

lyl8086 commented 10 months ago

manual added.