spholmes / F1000_workflow

43 stars 33 forks source link

problem on order the rank #21

Open chloelulu opened 6 years ago

chloelulu commented 6 years ago

Hi,

I want to apply this workflow to my data analysis. But I can not understand how the following code was transformed. May I know more about it? PCoA on the ranks abund <- otu_table(pslog) abund_ranks <- t(apply(abund, 1, rank)) I found the result is not the order. The import data are logtransformed raw data. the apply function transform it into ranks based on the row. However, the final tranformed data is not started from 1, how could that be?

And then the fountion abund_ranks <- abund_ranks - 329 set minus 329, why is 329 set here, and how to set the number properly?

Thanks in advance.

krisrs1128 commented 6 years ago

Hi @chloelulu, this is a somewhat subtle transformation of the data. I think it's worth thinking through each step carefully.

The first step is a rank-transformation on the abundance matrix. Specifically, abund_ranks[i, j] gives the rank of species j in sample i. The most abundant species in the i^th row has rank 389, since there are 389 species. If there were no ties, the least abundant species would have rank 1. However, there are usually many species with 0 counts in each sample -- these ties explain why you don't see any rank equal to 1. For example, the smallest rank in the first sample is 82.5, because there are 164 species with counts of 0 in that sample.

Working with the ranks is an improvement, but there is a problem -- the difference between very abundant species (say, 389 vs. 383) looks the same as the difference between rare species (e.g., 1 vs. 7, if we imagine we didn't have any ties). Really, we'd like our dimensionality reduction to pay more attention to differences between abundant species. To do this, we make the ranks for all the rare species look the same. This is why we do the next two lines,

abund_ranks <- abund_ranks - 329
abund_ranks[abund_ranks < 1] <- 1

The first line shifts all the ranks down, and the second "squashes" all the ranks for all the less-abundant species, so that they are all equal to 1. If we didn't do the subtraction, we could still achieve the squashing effect, but there would be a big gap between the high ranks of abundant species and those low abundance species that have been fixed to 1. The choice 329 ensures that about 85% of species will be squashed to 1 (because 329 / 389 = 84.6%), so differences in abundance between rare species are ignored (they're all equal to 1), and the ordination pays more attention to ranks of the abundant species. You can should change 329 to whatever gives you an appropriate percentage of squashed species. Let us know if you have any questions.

chloelulu commented 6 years ago

Hi, @krisrs1128, Thanks so much for the detailed explanation. Appreciate it!

shreyaskumbhare commented 5 years ago

Hi @krisrs1128 , Thank you the description is really useful. I would like to know, was there any reason behind keeping the threshold of '85%' of species to be squashed? If no, how can one determine an appropriate threshold? Thanks in advance!

spholmes commented 5 years ago

It is not the 85% that is important it was more the prior belief that about 300 species were present.

On Tue, Apr 16, 2019 at 4:17 AM Shreyas Kumbhare notifications@github.com wrote:

Hi @krisrs1128 https://github.com/krisrs1128 , Thank you the description is really useful. I would like to know, was there any reason behind keeping the threshold of '85%' of species to be squashed? If no, how can one determine an appropriate threshold? Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/spholmes/F1000_workflow/issues/21#issuecomment-483617946, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvQvxjU03QsL1C4AZ8jS2HU1yn2Jyks5vhbFegaJpZM4SlHvg .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/