Open iciarfernandez opened 1 year ago
Hey @iciarfernandez thanks for the PR!
I made some changes onto main:
European
for predictEthnicity, verified it works, but I left vignette unchanged since you have those changes hereHere is some code to use your model data I uploaded onto bioconductor. Note that you will need to install and use bioconductor devel
library(ExperimentHub)
eh = ExperimentHub()
|======================================================================| 100%
snapshotDate(): 2023-04-04
query(eh, c('eoPredData'))
ExperimentHub with 2 records
# snapshotDate(): 2023-04-04
# $dataprovider: University of British Columbia
# $species: Homo sapiens
# $rdataclass: mixo_splsda, matrix
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["EH8090"]]'
title
EH8090 | eoPredModel
EH8091 | x_test
So those objects will be made available in the environment after loading with the code above.
Other things to do:
After that I will take a more in depth review before merging.
Let me know if you have any questions or want help! Thanks
Hey,
Thanks for the code! So it's working to load exactly what you copy-pasted above, but when I actually try to retrieve the eoPredModel object by running eh[["EH8090"]]
as indicated, it starts downloading it but then just displays the following message:
Warning message: package ‘eoPredData’ is not available for Bioconductor version '3.18'
Not sure if we need to fix something in the hub package before going forward then?
Moreover, I was just wondering how to then load the data from ExperimentHub into the function - is the idea to include
query(eh, c('eoPredModel'))
model=eh[['EH8080']]
at the start of the code (within the function itself) to basically serve the same purpose as data(ethnicityCpGs, envir=environment())
in your predictEthnicity function?
If that's the case I think I can figure out the rest by myself but I'll let you know if I have any more questions.
Thank you, Iciar
Warning message: package ‘eoPredData’ is not available for Bioconductor version '3.18'
Hey try downgrade bioconductor to 3.17:
BiocManager::install(version = "3.17")
Moreover, I was just wondering how to then load the data from ExperimentHub into the function - is the idea to include
query(eh, c('eoPredModel'))
model=eh[['EH8080']]
at the start of the code (within the function itself) to basically serve the same purpose as
data(ethnicityCpGs, envir=environment())
in your predictEthnicity function?
Yes I think so
I relabelled "Caucasian" to "European" so now vignette continuesup to eoPred section:
processing file: planet.Rmd
Quitting from lines 333-366 [unnamed-chunk-12] (planet.Rmd)
Error in predictPreeclampsia()
:
! object 'trainLabels' not found
Backtrace:
I updated the function to include the code where I load the model object (which yes, I think it basically replaces the equivalent of data(ethnicityCpGs, envir=environment())
in the ethnicity function) and to require the mixOmics and ExperimentHub packages, ran devtools::document()
, then devtools::check()
.
I think everything with the function is working now - the error that I am getting now with devtools::check()
is about the vignette; valMeta
in the vignette is supposed to be the pData of the val
object, but we didn't actually upload it to the ExperimentHub so I can't load it.
If you could update the ExperimentHub to 1) include the valMeta
object, which I have added to the dropbox and 2) delete the x_test
object and instead include a new val
object valBMIQ, also now in the dropbox, then I can try running devtools::check()
again and hopefully everything works!
Last but not least, I downgraded to Bioconductor 3.17 but still getting this warning when I try to load the data from ExperimentHub()
Warning message: package ‘eoPredData’ is not available for Bioconductor version '3.17'
Even though it does seem to install - i.e., I am able to run mod = eh[["EH8090"]]
and it works to load the data. Anyway, it doesn't seem to be a problem?
One more thing - it was discussed at lab meeting that instead of Ambiguous
, it should say Other
in the labels that predictEthnicity
outputs, since the tool can only calculate 3 ancestries but there are other ancestries out there (+ ancestry is a continuum) so in reality samples being called ambiguous may just be mixed or from an ancestry other than African/Asian/European. Let me know what you think and if you agree I'm happy to change that myself too!
Hi Iciar! Thanks for comments. Let me know if I'm understanding correctly:
predictPreeclampsia
does not seem to use the pretrained models we uploaded onto experimenthub anymore, but instead trains a model "on-the-fly" using internal data trainBetas
and trainLabel
.
So if this is the new plan, then we don't need the experimentHub package for the function to work.
If this is true, then we need the following r objects to be added to the R package itself as sysdata.rda (internal, not available to user) or as rds objects:
trainBetas
, trainLabels
for predictPreeclampsia, But need to ensure that they are the under the size limit -- the model object was over limit hence the original plan to use the experimenthub pretrained model object.
Let's also just focus on the function for now, and worry about vignette after. I suggest maybe setting the code chunks in vignette to eval = FALSE for now to get the checks to run through
One more thing - it was discussed at lab meeting that instead of
Ambiguous
, it should sayOther
in the labels thatpredictEthnicity
outputs, since the tool can only calculate 3 ancestries but there are other ancestries out there (+ ancestry is a continuum) so in reality samples being called ambiguous may just be mixed or from an ancestry other than African/Asian/European. Let me know what you think and if you agree I'm happy to change that myself too!
I am moving this to a new issue to organize our discussions more
It does! Not sure if it didn't push correctly, but from my end this is the code I am seeing for the start of the predictPreeclampsia
function:
predictPreeclampsia <- function(betas, ...){
eh = ExperimentHub()
mod = eh[['EH8090']]
trainCpGs = colnames(mod$X)
peCpGs = mixOmics::selectVar(mod)$name
pp <- intersect(colnames(betas), peCpGs)
The issue I am having that I described here is that I need a few objects updated in the ExperimentHub package to fix the errors that devtools::check()
is throwing now - let me know if you see what I mean in that comment (I think maybe you missed it?) and otherwise we can discuss further.
Ah I was looking at the wrong branch!
Can you set the vignette to code chunks to false for now so we can focus on the function and related check errors?
In the meantime I'll add those objects to experimenthub and remove x_test, it will require bioconductor to give us access.
In response to everything:
1) I think I implemented all the changes to the code discussed above - now (when ignoring the vignette by setting it to include=FALSE for the time being) devtools::check()
is not throwing any errors but it does have one warning and two notes.
The warning is
Undocumented arguments in documentation object 'predictPreeclampsia'
‘...’
and the notes are
Unexported object imported by a ':::' call: ‘mixOmics:::predict.mixo_spls’
See the note in ?`:::` about the use of this operator.
predictAge: no visible global function definition for ‘data’
predictAge: no visible binding for global variable ‘ageCpGs’
predictEthnicity: no visible global function definition for ‘data’
predictEthnicity: no visible binding for global variable
‘ethnicityCpGs’
Undefined global functions or variables:
ageCpGs data ethnicityCpGs
Consider adding
importFrom("utils", "data")
to your NAMESPACE file.
On the other hand, BiocCheck::BiocCheck()
does throw 2 errors that I am not sure how to fix:
ERROR: At least 80% of man pages documenting exported objects must have runnable examples.
ERROR: Maintainer must add package name to Watched Tags on the support site; Edit your Support Site User Profile to add Watched Tags.
Thank you for submitting the request to ExperimentHub!
I added ExperimentHub and mixOmics as imports because an error that the check step was previously throwing was related to not having them as dependencies - is there a better way to do this?
Let me know what you think!
In response to everything:
- I think I implemented all the changes to the code discussed above - now (when ignoring the vignette by setting it to include=FALSE for the time being)
devtools::check()
is not throwing any errors but it does have one warning and two notes.The warning is
Undocumented arguments in documentation object 'predictPreeclampsia' ‘...’
the parameter ...
needs documentation. Can write something like "feeds into this other function"
and the notes are
Unexported object imported by a ':::' call: ‘mixOmics:::predict.mixo_spls’ See the note in ?`:::` about the use of this operator.
Am not sure if Notes need to be addressed for bioconductor/cran. I recall that they are ok to have but can you check?
On the other hand,
BiocCheck::BiocCheck()
does throw 2 errors that I am not sure how to fix:
ERROR: At least 80% of man pages documenting exported objects must have runnable examples.
Let's see if this is still there after addressing the examples error for predictPreclampsia.
ERROR: Maintainer must add package name to Watched Tags on the support site; Edit your Support Site User Profile to add Watched Tags.
Done. Reminds me, can you add yourself as a contributor or author to DESCRIPTION?
I see DS_STORE file is there
Hi Victor,
Sorry that I dropped the ball on this, I got distracted with finishing projects and thesis writing - I think I made all the changes you requested. Please let me know of anything else I need to get done on my end and hopefully we can get this working soon!
Thanks, Icíar
Checking for support site registration...
* ERROR: Unable to find your email in the Support Site: HTTP 404 Not
Found.
The email I have right now for you is Iciar.Fernandez@bcchr.ca
, but lmk if you want to change it
[x] Please also make sure that you are following / watching this repository: https://github.com/wvictor14/eoPredData
[x] Maybe watch this package tracker thread too, I think we need to add the package to bioconductor before we can use it here
I implemented the requested changes and check() didn't throw any errors or warnings. Re: your last comment, if you could change my email to iciarfernandez@outlook.com instead that would be great - I signed up to the support site with that email. Thank you!
Some merge conflicts, need to pull branch main into branch eoPred and locally fix merge conflicts
updated vignette to reflect wording change from "caucasian" to "european"
added example usage of the predictPreeclampsia.R function to vignette
posted code for the predictPreeclampsia.R function