rcastelo / GenomicScores

Provide support to store and retrieve genomic scores associated to nucleotide positions along a genome
8 stars 5 forks source link

Question: Imputation of scores #7

Open lima1 opened 5 years ago

lima1 commented 5 years ago

Hi,

I'm looking for a solution of a fairly straightforward problem: I have scores for all heterozygous SNPs in a pool of normals describing how the allelic fraction (not population allele frequency) deviates from the expected 0.5. There is also an error associated with each available position based on total coverage and number of samples with this SNP.

I currently have an ad hoc way of imputing a score of variants not in the pool of normal by averaging the scores of the n nearest neighbors, but a weighted running median would be better.

Sorry for the basic question, but is this something I can use GenomicScores for, or maybe make it work, maybe by including some fake data points?

Thanks in advance, Markus

rcastelo commented 5 years ago

Hi Markus, If I understand you correctly, we could incorporate the scores you have as an AnnotationHub resource available via 'getGScores()'. This is a manual process that requires parsing files and put them available in the proper format but once they are in place, then you can query those scores in an uniform way with the functions 'gscores()' and 'score()'. Is this what you were asking for?

Cheers,

robert.

lima1 commented 5 years ago

Hi Robert,

thanks for getting back to me and sorry for my late response.

Now it makes sense, I thought I missed something in the documentation about generating these data structures. Since these scores depend on many things, they would be unique to each user and their normal samples.

My question was: essentially now I have a custom GRanges with scores. Only a (small) fraction of the genome has scores associated, but I'd like to impute the scores for all requested ranges. Do you think GenomicScores is the right tool for this? Looks like not (yet?), right?

Markus

rcastelo commented 5 years ago

Hi,

GenomicScores currently has nothing like that but I guess it would not be that difficult to implement this feature and enable it with additional arguments to the call to 'gscores()' or 'score()', e.g., impute.method=c("none", "min", "max", "mean"), impute.distance=0L, so that every NA value could be imputed using one of the methods applied to the values observed within a physical distance expressed in bp. Is this what you are looking for?

lima1 commented 4 years ago

Hi Robert,

I'm currently benchmarking best ways of imputing the scores and get back to you. But that sounds perfect.

Markus