wvictor14 / planet

For inferring ancestry, gestational age, and cell type proportions from placental DNA methylation array data
4 stars 1 forks source link

vignette update + function addition #19

Open iciarfernandez opened 1 year ago

iciarfernandez commented 1 year ago
wvictor14 commented 1 year ago

Hey @iciarfernandez thanks for the PR!

I made some changes onto main:

wvictor14 commented 1 year ago

Here is some code to use your model data I uploaded onto bioconductor. Note that you will need to install and use bioconductor devel

 library(ExperimentHub)

 eh = ExperimentHub()

  |======================================================================| 100%

snapshotDate(): 2023-04-04

 query(eh, c('eoPredData'))

ExperimentHub with 2 records

# snapshotDate(): 2023-04-04

# $dataprovider: University of British Columbia

# $species: Homo sapiens

# $rdataclass: mixo_splsda, matrix

# additional mcols(): taxonomyid, genome, description,

#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,

#   rdatapath, sourceurl, sourcetype

# retrieve records with, e.g., 'object[["EH8090"]]'

           title

  EH8090 | eoPredModel

  EH8091 | x_test

So those objects will be made available in the environment after loading with the code above.

Other things to do:

After that I will take a more in depth review before merging.

Let me know if you have any questions or want help! Thanks

iciarfernandez commented 1 year ago

Hey,

Thanks for the code! So it's working to load exactly what you copy-pasted above, but when I actually try to retrieve the eoPredModel object by running eh[["EH8090"]] as indicated, it starts downloading it but then just displays the following message:

Warning message: package ‘eoPredData’ is not available for Bioconductor version '3.18'

Not sure if we need to fix something in the hub package before going forward then?

Moreover, I was just wondering how to then load the data from ExperimentHub into the function - is the idea to include

query(eh, c('eoPredModel'))

model=eh[['EH8080']]

at the start of the code (within the function itself) to basically serve the same purpose as data(ethnicityCpGs, envir=environment()) in your predictEthnicity function?

If that's the case I think I can figure out the rest by myself but I'll let you know if I have any more questions.

Thank you, Iciar

wvictor14 commented 1 year ago

Warning message: package ‘eoPredData’ is not available for Bioconductor version '3.18'

Hey try downgrade bioconductor to 3.17:

BiocManager::install(version = "3.17")

Moreover, I was just wondering how to then load the data from ExperimentHub into the function - is the idea to include

query(eh, c('eoPredModel'))

model=eh[['EH8080']]

at the start of the code (within the function itself) to basically serve the same purpose as data(ethnicityCpGs, envir=environment()) in your predictEthnicity function?

Yes I think so

wvictor14 commented 1 year ago

I relabelled "Caucasian" to "European" so now vignette continuesup to eoPred section:

processing file: planet.Rmd

Quitting from lines 333-366 [unnamed-chunk-12] (planet.Rmd) Error in predictPreeclampsia(): ! object 'trainLabels' not found Backtrace:

  1. planet::predictPreeclampsia(t(val))
  2. mixOmics::splsda(trainBetas, trainLabels, ncomp = 1, keepX = 45) Execution halted
iciarfernandez commented 1 year ago

I updated the function to include the code where I load the model object (which yes, I think it basically replaces the equivalent of data(ethnicityCpGs, envir=environment()) in the ethnicity function) and to require the mixOmics and ExperimentHub packages, ran devtools::document(), then devtools::check().

I think everything with the function is working now - the error that I am getting now with devtools::check() is about the vignette; valMeta in the vignette is supposed to be the pData of the val object, but we didn't actually upload it to the ExperimentHub so I can't load it.

If you could update the ExperimentHub to 1) include the valMeta object, which I have added to the dropbox and 2) delete the x_test object and instead include a new val object valBMIQ, also now in the dropbox, then I can try running devtools::check() again and hopefully everything works!

Last but not least, I downgraded to Bioconductor 3.17 but still getting this warning when I try to load the data from ExperimentHub()

Warning message: package ‘eoPredData’ is not available for Bioconductor version '3.17'

Even though it does seem to install - i.e., I am able to run mod = eh[["EH8090"]] and it works to load the data. Anyway, it doesn't seem to be a problem?

iciarfernandez commented 1 year ago

One more thing - it was discussed at lab meeting that instead of Ambiguous, it should say Other in the labels that predictEthnicity outputs, since the tool can only calculate 3 ancestries but there are other ancestries out there (+ ancestry is a continuum) so in reality samples being called ambiguous may just be mixed or from an ancestry other than African/Asian/European. Let me know what you think and if you agree I'm happy to change that myself too!

wvictor14 commented 1 year ago

Hi Iciar! Thanks for comments. Let me know if I'm understanding correctly:

predictPreeclampsia does not seem to use the pretrained models we uploaded onto experimenthub anymore, but instead trains a model "on-the-fly" using internal data trainBetas and trainLabel.

So if this is the new plan, then we don't need the experimentHub package for the function to work.

If this is true, then we need the following r objects to be added to the R package itself as sysdata.rda (internal, not available to user) or as rds objects:

But need to ensure that they are the under the size limit -- the model object was over limit hence the original plan to use the experimenthub pretrained model object.

Let's also just focus on the function for now, and worry about vignette after. I suggest maybe setting the code chunks in vignette to eval = FALSE for now to get the checks to run through

wvictor14 commented 1 year ago

One more thing - it was discussed at lab meeting that instead of Ambiguous, it should say Other in the labels that predictEthnicity outputs, since the tool can only calculate 3 ancestries but there are other ancestries out there (+ ancestry is a continuum) so in reality samples being called ambiguous may just be mixed or from an ancestry other than African/Asian/European. Let me know what you think and if you agree I'm happy to change that myself too!

I am moving this to a new issue to organize our discussions more

iciarfernandez commented 1 year ago

It does! Not sure if it didn't push correctly, but from my end this is the code I am seeing for the start of the predictPreeclampsia function:

predictPreeclampsia <- function(betas, ...){

  eh = ExperimentHub()
  mod = eh[['EH8090']]
  trainCpGs = colnames(mod$X)

  peCpGs = mixOmics::selectVar(mod)$name
  pp <- intersect(colnames(betas), peCpGs)

The issue I am having that I described here is that I need a few objects updated in the ExperimentHub package to fix the errors that devtools::check() is throwing now - let me know if you see what I mean in that comment (I think maybe you missed it?) and otherwise we can discuss further.

wvictor14 commented 1 year ago

Ah I was looking at the wrong branch!

Can you set the vignette to code chunks to false for now so we can focus on the function and related check errors?

In the meantime I'll add those objects to experimenthub and remove x_test, it will require bioconductor to give us access.

iciarfernandez commented 1 year ago

In response to everything:

1) I think I implemented all the changes to the code discussed above - now (when ignoring the vignette by setting it to include=FALSE for the time being) devtools::check() is not throwing any errors but it does have one warning and two notes.

The warning is

Undocumented arguments in documentation object 'predictPreeclampsia'
    ‘...’

and the notes are

Unexported object imported by a ':::' call: ‘mixOmics:::predict.mixo_spls’
    See the note in ?`:::` about the use of this operator.
predictAge: no visible global function definition for ‘data’
  predictAge: no visible binding for global variable ‘ageCpGs’
  predictEthnicity: no visible global function definition for ‘data’
  predictEthnicity: no visible binding for global variable
    ‘ethnicityCpGs’
  Undefined global functions or variables:
    ageCpGs data ethnicityCpGs
  Consider adding
    importFrom("utils", "data")
  to your NAMESPACE file.

On the other hand, BiocCheck::BiocCheck() does throw 2 errors that I am not sure how to fix:

ERROR: At least 80% of man pages documenting exported objects must have runnable examples.

ERROR: Maintainer must add package name to Watched Tags on the support site; Edit your Support Site User Profile to add Watched Tags.

  1. Thank you for submitting the request to ExperimentHub!

  2. I added ExperimentHub and mixOmics as imports because an error that the check step was previously throwing was related to not having them as dependencies - is there a better way to do this?

Let me know what you think!

wvictor14 commented 1 year ago

In response to everything:

  1. I think I implemented all the changes to the code discussed above - now (when ignoring the vignette by setting it to include=FALSE for the time being) devtools::check() is not throwing any errors but it does have one warning and two notes.

The warning is

Undocumented arguments in documentation object 'predictPreeclampsia'
    ‘...’

the parameter ... needs documentation. Can write something like "feeds into this other function"

and the notes are

Unexported object imported by a ':::' call: ‘mixOmics:::predict.mixo_spls’
    See the note in ?`:::` about the use of this operator.

Am not sure if Notes need to be addressed for bioconductor/cran. I recall that they are ok to have but can you check?

On the other hand, BiocCheck::BiocCheck() does throw 2 errors that I am not sure how to fix:

ERROR: At least 80% of man pages documenting exported objects must have runnable examples.

Let's see if this is still there after addressing the examples error for predictPreclampsia.

ERROR: Maintainer must add package name to Watched Tags on the support site; Edit your Support Site User Profile to add Watched Tags.

Done. Reminds me, can you add yourself as a contributor or author to DESCRIPTION?

I see DS_STORE file is there image

iciarfernandez commented 4 months ago

Hi Victor,

Sorry that I dropped the ball on this, I got distracted with finishing projects and thesis writing - I think I made all the changes you requested. Please let me know of anything else I need to get done on my end and hopefully we can get this working soon!

Thanks, Icíar

wvictor14 commented 4 months ago
 Checking for support site registration...
    * ERROR: Unable to find your email in the Support Site: HTTP 404 Not
      Found.

The email I have right now for you is Iciar.Fernandez@bcchr.ca, but lmk if you want to change it

iciarfernandez commented 4 months ago

I implemented the requested changes and check() didn't throw any errors or warnings. Re: your last comment, if you could change my email to iciarfernandez@outlook.com instead that would be great - I signed up to the support site with that email. Thank you!

wvictor14 commented 4 months ago

Some merge conflicts, need to pull branch main into branch eoPred and locally fix merge conflicts