iobio / vcf.iobio.io

MIT License
27 stars 11 forks source link

Allele Frequency Spectrum always 50% #74

Closed shwivel closed 9 months ago

shwivel commented 10 months ago

On vcf.iobio.io the allele frequency spectrum chart shows every file's variants at 50%, rather than some distribution of their actual frequencies, from a source such as gnomAD. As a result, there is always just one bar at 50%, no matter what vcf file is inputted, unlike the sort of distribution the app itself says is expected (when clicking the information icon).

AlistairNWard commented 10 months ago

The allele frequency spectrum is based on the distribution of alleles in the vcf file you provide it. If the vcf that you provide only has variants for a single sample, then you will get a very simplistic spectrum as the only possible values will be 50% or 100%. In order to see an actual spectrum, you need to provide a multisample vcf. If you use the example url, you will see an AF spectrum that is not a single bar.

shwivel commented 10 months ago

I was expecting that for a given vcf, representing the variants of one single person, it would depict the distribution of allele frequency of their variants, relative to the general population, using allele frequency databases like gnomAD. For example, if ALL of the person's variants had a 60% allele frequency, then you would see one single column at 60%. Or, suppose a person had 15 variants, 5 of which had an allele frequency of 20%, and 10 of which had an allele frequency of 60%, I would expect two columns, one at 20% which is half the size of another at 60%. Is that not the intention of this chart? I am not sure the usefulness of the chart otherwise, unless your target audience is more researchers with many samples and not an individual's analysis of their own.

AlistairNWard commented 10 months ago

The allele frequency chart in this application is geared towards a vcf file with many samples. It is sampling across the vcf file and giving the AF distribution across the for the samples in the file across those samples sites. I agree that this not a useful chart for single sample vcf files.

shwivel commented 10 months ago

Gotchya. Thanks for the explanation. I do think it would be cool if there were an additional widget on there depicting it in the way I described. Not sure it'd be particularly actionable in any way, but fun stats nevertheless.

AlistairNWard commented 9 months ago

It isn't a bad idea and we've long known that the current setup has limited value for vcf with small numbers of samples. It is something we can try and look into (although time is a commodity we don't have too much of), but we'd have to see how quickly such a calculation would converge. In order to work as a web app, we can only sample data from the vcf file, so we would have to be sure to sample sites that ensure we sample across variants with a range of allele frequencies, variants that segregate across different ancestries etc. It probably wouldn't be quite as simple as it might first appear!

shwivel commented 9 months ago

Makes sense. Thanks for your work on these iobio tools.

AlistairNWard commented 9 months ago

Pleasure, and thanks for your comments. It's helpful for us to know what people are trying to do with the tools. This helps us direct where we should do more work!