microbiome / mia

Microbiome analysis
https://microbiome.github.io/mia/
Artistic License 2.0
46 stars 25 forks source link

estimateAlpha #460

Closed ChouaibB closed 1 week ago

ChouaibB commented 9 months ago

Initial draft implementation.

antagomir commented 9 months ago

Related to #417

antagomir commented 9 months ago

I am tending to think that we should combine all these estimateXXX functions into a single estimateAlpha function as your PR suggest. The other estimateXXX functions could be turned into internal functions to simplify the interface. That estimateAlpha function can then have rarefaction as optional argument (including the user-specified rarefaction depth; by default the lowest read count among the samples).

Things to consider: 1) previously exported estimateXXX functions would need deprecation message when they are moved into internal functions 2) the diversity index names should have suffixes, like "shannon_diversity", "observed_richness" etc. 3) there is a mechanism available in R that can pick those argument names even when just the (unique) start of the index name is given, like "shannon" would be recognized to mean "shannon_diversity" so the users can use shorter names if they need to but the output need to use the full names (for instance Simpson index is available both for diversity and evenness, therefore it must be indicated which one it is).

If unclear, we can discuss more.

I have some further comments in mind after the next PR draft on this.

antagomir commented 9 months ago

The function has to be changed into estimateAlpha, which then includes rarefaction as an optional argument

antagomir commented 9 months ago

An idea: the method name (like "shannon") could also say "diversity", then the function would return all "diversity" indices and if one asks for "dominance" then those etc (richness / evenness / diversity / dominance / rarity are the options and the indices are listed in those corresponding functions now). If no method is specific, then all indices would be returned by default.

TuomasBorman commented 9 months ago

I think this is not the best way to implement this. Better way is to keep those estimateDiversity.R and other files and just add estimateAlpha.R with estimateAlpha functionality (remove documentation from necessary places)

The problem with moving all the code to one file is that it is a nightmare for someone that tries to check what has been changed. It looks like everything has

antagomir commented 9 months ago

I agree. The old and extensively tested functions can be kept as is. They should be just converted into internal functions that are not exported (as I already explictily suggested in the opening statement of this issue), and then called from a wrapper called estimateAlpha(). This should be also faster to implement than rewriting the entire function.

ChouaibB commented 9 months ago

Alright, I misunderstood the comment I will make the changes.

antagomir commented 9 months ago

Always aim at minimal and straightfwd changes unless otherwise agreed.

If you could update this PR accordingly that would be great!

ChouaibB commented 9 months ago

An idea: the method name (like "shannon") could also say "diversity", then the function would return all "diversity" indices and if one asks for "dominance" then those etc (richness / evenness / diversity / dominance / rarity are the options and the indices are listed in those corresponding functions now). If no method is specific, then all indices would be returned by default.

I didn't understand, could you please explain more.

ChouaibB commented 9 months ago

An idea: the method name (like "shannon") could also say "diversity", then the function would return all "diversity" indices and if one asks for "dominance" then those etc (richness / evenness / diversity / dominance / rarity are the options and the indices are listed in those corresponding functions now). If no method is specific, then all indices would be returned by default.

I didn't understand, could you please explain more.

Do you mean smt like this?

.get_indices <- function(measure) {
    switch(measure,
           "diversity" = c("coverage_diversity", "coverage",
                           "faith_diversity", "faith",
                           "fisher_diversity", "fisher",
                           "gini_simpson_diversity", "gini_simpson",
                           "inverse_simpson_diversity", "inverse_simpson",
                           "log_modulo_skewness_diversity", "log_modulo_skewness",
                           "shannon_diversity", "shannon"),
           "dominance" = c("absolute_dominance", "absolute",
                           "dbp_dominance", "dbp",
                           "core_abundance_dominance", "core_abundance",
                           "gini_dominance", "gini", 
                           "dmn_dominance", "dmn",
                           "relative_dominance", "relative",
                           "simpson_lambda_dominance", "simpson_lambda"),
           "evenness" = c("camargo_evenness", "camargo",
                          "pielou_evenness", "pielou",
                          "simpson_evenness",
                          "evar_evenness", "evar",
                          "bulla_evenness", "bulla"),
           "richness" = c("ace_richness", "ace",
                          "chao1_richness", "chao1",
                          "hill_richness", "hill",
                          "observed_richness", "observed"))
}
antagomir commented 9 months ago

After this PR is accepted, we need to remember update OMA.

ChouaibB commented 8 months ago

Final comments.

Lastly, run BiocCheck::BiocCheck() and fix warnings/notes related to this function (Check the mia.BiocCheck folder and the file inside it)

The earlier devtools::check() there were no error (ofc otherwise won't commit) and no warnings, few notes that regards other functions. But I could test with BiocCheck::BiocCheck() now.

TuomasBorman commented 8 months ago

Final comments.

Lastly, run BiocCheck::BiocCheck() and fix warnings/notes related to this function (Check the mia.BiocCheck folder and the file inside it)

The earlier devtools::check() there were no error (ofc otherwise won't commit) and no warnings, few notes that regards other functions. But I could test with BiocCheck::BiocCheck() now.

Yes, bioccheck cheks the coding style also

ChouaibB commented 8 months ago

I couldn't get the BiocCheck installed but the results of the devtools::check are as follows (complete log attached): image 00check.log It seems to be there are no warning regarding the function but perhaps others.

Anyway, you could please do whatever you want with these commits and the issue. Thanks and good luck! :)

TuomasBorman commented 8 months ago

Ok, there was atleast some indentation problems (Bioc has a rule that indentation must be multiplication of 4 spaces)

But these are easy and fast to fix, thanks!

ChouaibB commented 8 months ago

Thank you! good luck with all! :)

antagomir commented 4 months ago

up - if anything more needed from my side to close this PR let me know

TuomasBorman commented 4 months ago

At least the name should be addAlpha to follow new naming scheme (related to that we can have a discussion about naming, I will send you msg)

antagomir commented 3 months ago

Related to #417

TuomasBorman commented 3 months ago

This looks good, the name should be addAlpha- And also it might be best to move deprecated functions to deprecate.R file

antagomir commented 2 months ago

Ok @ChouaibB let's take this up again and aim to finalize.

antagomir commented 2 months ago

I will add some comments - wait until that before further changes.

TuomasBorman commented 2 months ago

Also important to check that the code follows Biconductor guidelines.

  1. BiocCheck::BiocCheck()
  2. Fix issues related to these lines that are added by this PR (there are not many. I quickly looked that indentation was off in couple of lines)
antagomir commented 2 months ago

Some remaining comments + BiocChecks will be a must.

We should probably aim to merge this very soon, then do some further testing on the fresh new version.

antagomir commented 2 months ago

Can we utilize the rarefyAssay function directly (see #536) - if not done yet?

TuomasBorman commented 1 week ago

I think this is good to go. @ake123 Do you have time to check that there are necessary unit tests (there might be already)? Check Leo's comments about testing whether rarefying was disabled (n.iter = NULL and n.iter = 0)

ake123 commented 1 week ago

Sure I can check that

I think this is good to go. @ake123 Do you have time to check that there are necessary unit tests (there might be already)? Check Leo's comments about testing whether rarefying was disabled (n.iter = NULL and n.iter = 0)

TuomasBorman commented 1 week ago

Fixed some issues and updated documentation. Would be nice to have someone else's thoughts (if everything works) but I think this is good now

TuomasBorman commented 1 week ago

One last thing: I changed n.iter to niter to harmonize the paramater names. nsomething is used to denote number of something. We have already nclasses and ndimred.

If someone is against this change, I'm happy to discuss about this

ake123 commented 1 week ago

It seems the unit test have been updated with niter=NULL, niter=0 as default and with no errors.

TuomasBorman commented 1 week ago

@ake123 Can you check that OMA and mia* packages are updated?

ake123 commented 1 week ago

Yes sure! Should I add section addAlpha in the diversity section?