Rarity indices - Githubissues

microbiome / mia

Microbiome analysis

https://microbiome.github.io/mia/

Artistic License 2.0

49 stars 27 forks source link

Rarity indices #59

Closed antagomir closed 3 years ago

antagomir commented 3 years ago

The microbiome pkg has a set of functions to quantify "rarity" and subset the data to rare groups. The higher the rarity, the higher the diversity. These could be therefore added to estimateDiversity but conceptually they focus on rarity and are typically not included among standard diversity indices, in this sense they may deserve their own function.

The set of functions in microbiome pkg are as follows:

Function to identify rare taxa (complement to core taxa): rare_members.R

Function to take subset of the phyloseq object that only includes rare taxa (complement to microbiome::core): rare.R

Function to calculate rarity index (could be estimateRarity in mia?) rarity.R

The helper functions for specific indices:

log_modulo_skewness.R
Could be combined: low_abundance.R & rare_abundance.R

FelixErnst commented 3 years ago

I am not sure about this, since I don't fully grasp the concept. Could you implement it and at the same time add a section to MiaBook?

antagomir commented 3 years ago

The overall motivation is that there has been some interest lately (in microbiome research) to carry out specific analyses at rare taxa as these are often overlooked by standard analyses. So, the concept is to focus on a particular subspace in the microbial community.

Yes we should be able to implement this as the time allows.

FelixErnst commented 3 years ago

Is this solved?

antagomir commented 3 years ago

No. But it is not urgent or very critical either. There are some interesting rarity indices (log_modulo_skewness) that could be migrated from microbiome package, and that could complement alpha diversity indices. But this is not necessary for Bioc submission. Is there another way to list non-urgent development ideas, rather than through issues?

FelixErnst commented 3 years ago

I added this for the Bioc 3.14 milestone. Maybe this can help plan the implementation

FelixErnst commented 3 years ago

@antagomir Would this be solved by the linked PR?

antagomir commented 3 years ago

log modulo skewness is one kind of diversity measure, with a focus on rarity; the rarity function is solved by adding log_modulo_skewness as an option in estimateDiversity in #102

The need for rare and rare_members is not solved by that.

TuomasBorman commented 3 years ago

Hi,

about this rare and rare_members thing.

Do we already have microbiome::rare_members? I think getRareTaxa does the job. It returns taxa whose abundance are under specific threshold. It's complement to getPrevalentTaxa/microbiome::core_members.
What we don't have are microbiome::rare and microbiome::core; functions that return a subset. So, I think those could be created also in getPrevalence.R. I think they could be done as following (subsetRareTaxa, subsetPrevalentTaxa?)

x <- agglomerateByRank(x)
a <- getRareTaxa(x)
x[a]

FelixErnst commented 3 years ago

So I looked up, how rare is implemented in microbiome.

The subsetRareTaxa name is a bit unclear for my taste. Are taxonomic values subset or are taxonomic information used for subsetting?

subsetToRareTaxa and subsetToPrevalentTaxa might a bit more clear. I would also put it in getPrevalence.R

antagomir commented 3 years ago

I am ok with these.