Closed JennyCNS closed 5 months ago
Hi Jennifer,
thanks for reporting this. As far as I can tell, you already solved this?! Great!
However, I do have some questions for clarification. Also, for future reference, and maybe to improve the situation for other users, I'll try to recap your approach. The file snippets that you posted look like the following is happening:
Number=R
), which is what grenedalf expects. The way your code seems to work, this also works for non-biallelic positions as well, by simply concatenating the RD and AD field as reported by PoolSNP?However, as far as I can tell, the AD FORMAT field is non-standard - it does not appear in the VCF specification. When implementing this in grenedalf, I've hence tried to find out what people mostly do, see here and here for instance. That second link though seems to tell us that that DP
is not exactly ref + alt allele count - it might differ depending on filter settings... But, the way it's implemented in PoolSNP, it's the sum, so we are good here. Also, given that AD is non-standard, it can be used by each program as they see fit - and PoolSNP seems to report it in a non-typical way, as a single number, and you changed that back to the more typical way of using that field.
Is my interpretation of your approach correct? Also, do you have any suggestions on how do improve the situation? If time permits, I could implement the PoolSNP way of using the AD field in grenedalf as well, if that helps. Maybe @capoony (the author of PoolSNP) wants to comment on this for clarification as well? In particular to check if I understood the PoolSNP VCF format correctly.
Cheers, thanks, and so long Lucas
Oh also, are you planning to run any other grenedalf tools, other than the frequency command? Because that one literally only gives you the information that you already have there from the RD
and AD
fields, so you could as well compute that with your script there directly ;-)
Hey @JennyCNS,
any news on this? I am not sure what I can do to help with this - would it help to be more lenient with respect to the requirement of the "AD" FORMAT field being present in the VCF header? Maybe just warn about it, but as long as the actual lines in the file contain an AD annotation, it should still work? What do you think?
Cheers Lucas
Hi @JennyCNS,
just wanted to give a brief heads-up about the latest release grenedalf v0.5.0. It has several improvements that might be relevant for you. Also, I now wrote some additional documentation about VCF input, see here.
Hope that helps. Going to close the issue now, but feel free to re-open or start a new one if you have more questions!
Cheers and so long Lucas
Hello,
I am trying to run grenedalf frequency on my vcf file generated by PoolSNP from an mpileup file using
and got the following error:
Cannot iterate over VCF file /gxfs_home/geomar/smomw573/work/seasonal_adaptation/analysis/PoolSNP/finalfile.vcf using the "AD" FORMAT field to count allelic depths, as that field is not part of the VCF file.
PoolSNP output vcf head
The vcf file was then formated with the following python script:
and we corrected the AD field in the vcf header
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
and now grenedalf frequency in running.
Many thanks,
Jennifer