Hi, I recently went to a talk by Dr. Susan Holmes and she convinced me to transform my data to ranks rather than using pure count data or proportions for downstream analyses (particularly creating networks).

However, to do this -I need to bin my samples accordingly, and then rank them. I am working in phyloseq and I have encountered a couple issues with the rank functions.

Here is what I am currently doing, along with my comments following "#"

abund = otu_table(ps1)

then to bin low frequency OTUs- I conduct the following:

preprankHOT = transform_sample_counts(abund, function(x) ifelse(x < 1 | x > 10, x, 1))

then I would like to use the rank functions so I have used two different methods

x1 = transform_sample_counts(preprankHOT4, rank, ties.method="min")

this seems to rank thinks in a numerical order where larger numbers have higher ranks, but I don't understand why the ranks jump so high. For instance, I have many 0s which are automatically re-ranked as '1'... but then my next values (from 1-10 in the original abundance data) are sometimes ranked as high as 3030! I think perhaps it is adding up all of the 0s (reranked as 1s) to get to the next rank...?

#But this seems like it would mess up downstream stats.... any advice?

#I get the same results as this if I use: x2 <- otu_table(apply(otu_table(preprankHOT), 2, threshrankfun(1, ties.method="min")), taxa_are_rows(preprankHOT))

The second way I have tried doing this is:

HOT_ranks <- t(apply(preprankHOT, 1, rank, ties.method = "min"))

although this gives me better numbers (numbers are not in the thousands)- this is even worse since the ranks are wrong. If I compare my abundance table (raw count data) or my binned count data (where I binned everything between 1-10 as 2)- some of the smaller numbers actually get ranked higher than larger numbers....

Do you have any advice?

I am pasting some pictures of my data frames below: original data frame screenshot 2019-03-01 14 19 10

binned data frame (regrouping numbers 1-10 as 2 screenshot 2019-03-01 14 20 14

Ranked dataframes Hot_ranks screenshot 2019-03-01 14 15 36

x1 screenshot 2019-03-01 14 17 40

x2 screenshot 2019-03-01 14 16 41

Maybe following step by step the Microbiome workflow more closely will make for a better approach (https://f1000research.com/articles/5-1492/v2). See the section entitled PCA on ranks

My guess is that for some of the OTUs/ASVs you only have one or two samples that have these present, you should probably filter those out as they will have all the ties together and then the ranking will be very high.

On Fri, Mar 1, 2019 at 2:23 PM juliehopper notifications@github.com wrote:

Hi, I recently went to a talk by Dr. Susan Holmes and she convinced me to transform my data to ranks rather than using pure count data or proportions for downstream analyses (particularly creating networks).

However, to do this -I need to bin my samples accordingly, and then rank them. I am working in phyloseq and I have encountered a couple issues with the rank functions.

Here is what I am currently doing, along with my comments following "#"

abund = otu_table(ps1)

then to bin low frequency OTUs- I conduct the following:

preprankHOT = transform_sample_counts(abund, function(x) ifelse(x < 1 | x

10, x, 1))

then I would like to use the rank functions so I have used two different

methods

x1 = transform_sample_counts(preprankHOT4, rank, ties.method="min")

this seems to rank thinks in a numerical order where larger numbers have

higher ranks, but I don't understand why the ranks jump so high. For instance, I have many 0s which are automatically re-ranked as '1'... but then my next values (from 1-10 in the original abundance data) are sometimes ranked as high as 3030! I think perhaps it is adding up all of the 0s (reranked as 1s) to get to the next rank...?

#But this seems like it would mess up downstream stats.... any advice?

#I get the same results as this if I use: x2 <- otu_table(apply(otu_table(preprankHOT), 2, threshrankfun(1, ties.method="min")), taxa_are_rows(preprankHOT))

The second way I have tried doing this is:

HOT_ranks <- t(apply(preprankHOT, 1, rank, ties.method = "min"))

although this gives me better numbers (numbers are not in the thousands)-

this is even worse since the ranks are wrong. If I compare my abundance table (raw count data) or my binned count data (where I binned everything between 1-10 as 2)- some of the smaller numbers actually get ranked higher than larger numbers....

Do you have any advice?

I am pasting some pictures of my data frames below: original data frame [image: screenshot 2019-03-01 14 19 10] https://user-images.githubusercontent.com/37647236/53669749-02391100-3c2d-11e9-823b-32d2052989fb.png

binned data frame (regrouping numbers 1-10 as 2 [image: screenshot 2019-03-01 14 20 14] https://user-images.githubusercontent.com/37647236/53669777-2563c080-3c2d-11e9-8857-70fa5332b88e.png

Ranked dataframes Hot_ranks [image: screenshot 2019-03-01 14 15 36] https://user-images.githubusercontent.com/37647236/53669605-817a1500-3c2c-11e9-8f5a-f78d51099f0b.png

x1 [image: screenshot 2019-03-01 14 17 40] https://user-images.githubusercontent.com/37647236/53669678-c9993780-3c2c-11e9-8f2f-61fef6d0b9be.png

x2 [image: screenshot 2019-03-01 14 16 41] https://user-images.githubusercontent.com/37647236/53669646-a53d5b00-3c2c-11e9-89a9-06558bf5eca8.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/1083, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvWWL1rjUCnTm7ZPWCiDGTZfA7vtBks5vSahKgaJpZM4bZ66X .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

joey711 / phyloseq

threshrankfun and rank functions don't seem to be working appropriately #1083

then to bin low frequency OTUs- I conduct the following:

then I would like to use the rank functions so I have used two different methods

then to bin low frequency OTUs- I conduct the following:

then I would like to use the rank functions so I have used two different

this seems to rank thinks in a numerical order where larger numbers have

although this gives me better numbers (numbers are not in the thousands)-