sherman5 / RVS

Computes estimates of the probability of related individuals sharing a rare variant.
1 stars 1 forks source link

problems with multipleVariantPValue function #3

Closed klmartinez closed 5 years ago

klmartinez commented 5 years ago

I’ve been trying to use RVS and ultimately the multifpleVariantPValue function to compute a p-value for each variant across two families. However, I’ve come across some issues. I’ve been able to merge vcf files and construct a .ped file that reflects the one provided in RVS-provided examples.

For example, my data looks like the following:

> sampleX$fam
             pedigree member father mother sex `affected`
Fam_7.11003     Fam_7  11003  15001  15002   2        2
Fam_7.11004     Fam_7  11004  15001  15002   1        2
Fam_7.11005     Fam_7  11005  15005  15004   2        2
Fam_7.11006     Fam_7  11006  15003  11005   2        2
Fam_27.10703   Fam_27  10703  15010  10704   1        2
Fam_27.10705   Fam_27  10705  15008  15009   1        2
Fam_27.10706   Fam_27  10706  15008  15009   1        1
> sampleX$genotypes
A SnpMatrix with  7 rows and  1658 columns
Row names:  Fam_7.11003 ... Fam_27.10706 
Col names:  locus.1 ... locus.1658
> head(sampleX$map)
        snp.name allele.1 allele.2
locus.1  locus.1        A        G
locus.2  locus.2        A        G
locus.3  locus.3        T        C
locus.4  locus.4        A     <NA>
locus.5  locus.5        C    CCCCT
locus.6  locus.6        G        T

I have also successfully used the RVsharing function:

> fam <- list(sampleB$fam, sampleC$fam)
> sharingProbsB <- RVsharing(famsB)
Probability subjects 11003 11004 11005 11006 among 11003 11004 11005 11006 share a rare variant: 0.02381
Probability subjects 10704 10703 10705 among 10704 10703 10705 share a rare variant: 0.05556
> sharingProbsB
     Fam_7     Fam_27 
0.02380952 0.05555556 
> signif(sharingProbsB, 3)
 Fam_7 Fam_27 
0.0238 0.0556

However, when I try to use the multipleVariantPValue, I keep getting the same error:

> resultB <- multipleVariantPValue(sampleX$genotypes, sampleX$fam, sharingProbsB)
Error in !observedSharing : invalid argument type
> traceback()
5: multipleFamilyPValue(sharingProbs, shareList[[var]])
4: FUN(X[[i]], ...)
3: lapply(X = X, FUN = FUN, ...)
2: sapply(names(shareList), function(var) {
       if (pot_pvals[var] <= ppval_cutoff) 
           multipleFamilyPValue(sharingProbs, shareList[[var]])
       else NA
   }, USE.NAMES = TRUE)
1: multipleVariantPValue(sampleX$genotypes, sampleX$fam, sharingProbsB)

Are there any insights for what I might be doing wrong and how to fix this?

sherman5 commented 5 years ago

What is the result of this line of code?

RVS:::convertMatrix(sampleX$genotypes@.Data, sampleX$fam)

This is what gets called internally and seems to be causing the error. If you can send data that reproduces this error I'll be able to debug it more fully.

klmartinez commented 5 years ago

Oh, I see. Here is a few of the lines:

$locus.760 NULL

$locus.761 Fam_7 FALSE

$locus.762 Fam_7 FALSE

$locus.763 Fam_27 FALSE

$locus.764 Fam_7 FALSE

$locus.765 Fam_7 FALSE

$locus.766 Fam_7 FALSE

$locus.767 Fam_7 FALSE

$locus.768 Fam_7 FALSE

$locus.769 Fam_7 FALSE

$locus.770 Fam_7 FALSE

$locus.771 Fam_7 Fam_27 FALSE FALSE

$locus.772 Fam_7 Fam_27 FALSE FALSE

$locus.773 Fam_7 FALSE

$locus.774 NULL

$locus.775 Fam_7 FALSE

$locus.776 Fam_7 Fam_27 FALSE FALSE

$locus.777 Fam_7 FALSE

Most of the loci are coming back FALSE with only a few coming back for TRUE and only for Fam_27.

Kiana Lee Martinez PhD Student, Genetics GIDP University of Arizona kianalee@email.arizona.edu kianalee.org

From: Tom Sherman Sent: Wednesday, March 20, 2019 7:27 AM To: sherman5/RVS Cc: klmartinez; Author Subject: Re: [sherman5/RVS] problems with multipleVariantPValue function (#3)

What is the result of this line of code? RVS:::convertMatrix(sampleX$genotypes@.Data, sampleX$fam) This is what gets called internally and seems to be causing the error. If you can send data that reproduces this error I'll be able to debug it more fully. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

sherman5 commented 5 years ago
$locus.760
NULL

This looks like it could cause an issue - I'll look into how a NULL element could get there in the first place. Do you have the values from the SnpMatrix for locus.760 ?

klmartinez commented 5 years ago

I have a value under allele.1 but NA under allele.2

subset(sampleX$map, snp.name == "locus.760") snp.name allele.1 allele.2 locus.760 locus.760 C

Kiana Lee Martinez PhD Student, Genetics GIDP University of Arizona kianalee@email.arizona.edu kianalee.org

From: Tom Sherman Sent: Wednesday, March 20, 2019 10:45 AM To: sherman5/RVS Cc: klmartinez; Author Subject: Re: [sherman5/RVS] problems with multipleVariantPValue function (#3)

$locus.760 NULL This looks like it could cause an issue - I'll look into how a NULL element could get there in the first place. Do you have to values from the SnpMatrix for locus.760 ? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

sherman5 commented 5 years ago

Would it be possible to send the ped.file with just locus.1 through locus.6? It looks like the input is unexpected in RVS, both for locus.4 and locus.5 - but SnpStats is reading it just fine. I think this is an error on our end.

klmartinez commented 5 years ago

Hi Tom,

I can either send you the .ped file with only locus.1 through locus.6 in a few days after I figure out how to subset the .ped file itself, or I can send you the originaly .ped file immediately? Do you have a preference?

Thanks so much, Kiana

Kiana Lee Martinez PhD Student, Genetics GIDP University of Arizona kianalee@email.arizona.edu kianalee.org

From: Tom Sherman Sent: Wednesday, March 20, 2019 11:53 AM To: sherman5/RVS Cc: klmartinez; Author Subject: Re: [sherman5/RVS] problems with multipleVariantPValue function (#3)

Would it be possible to send the ped.file with just locus.1 through locus.6? It looks like the input is unexpected in RVS, both for locus.4 and locus.5 - but SnpStats is reading it just fine. I think this is an error on our end. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

sherman5 commented 5 years ago

The original .ped file would be great. I only suggested sub-setting it because I wasn't sure if there were any restrictions on sharing the data.

klmartinez commented 5 years ago

Okay great. I’ve attached the .ped (sampleX.ped) file that is merged from two family .ped files. However, I just want to note that I manually change $fam information to accurately reflect our family pedigree structures. I’ve attached my working R script in case it would be helpful.

Kiana Lee Martinez PhD Student, Genetics GIDP University of Arizona kianalee@email.arizona.edu kianalee.org

From: Tom Sherman Sent: Thursday, March 21, 2019 2:06 PM To: sherman5/RVS Cc: klmartinez; Author Subject: Re: [sherman5/RVS] problems with multipleVariantPValue function (#3)

The original .ped file would be great. I only suggested sub-setting it because I wasn't sure if there were any restrictions on sharing the data. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

sherman5 commented 5 years ago

Since you're replying to a github issue from an email, I don't think the attachment will work. You can either email me directly at tomsherman159@gmail.com or attach the files at https://github.com/sherman5/RVS/issues/3

sherman5 commented 5 years ago

I found the bug and fixed it in commit de5b308 . I'll update the Bioconductor version, but if you want the fix immediately, then install the package directly from github, i.e. BiocManager::install("sherman5/RVS").

Let me know if you're still experiencing any issues.

klmartinez commented 5 years ago

Thank you very much!

Kiana Lee Martinez PhD Student, Genetics GIDP University of Arizona kianalee@email.arizona.edu kianalee.org

From: Tom Sherman Sent: Monday, March 25, 2019 12:13 PM To: sherman5/RVS Cc: klmartinez; Author Subject: Re: [sherman5/RVS] problems with multipleVariantPValue function (#3)

I found the bug and fixed it in commit de5b308 . I'll update the Bioconductor version, but if you want the fix immediately, then install the package directly from github, i.e. BiocManager::install("sherman5/RVS"). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.