Open audrey-bollas opened 21 hours ago
I made a pull request to fix the if statement, in case this is the problem. I tested it on several examples. But I am not an expert in R by any means, so I would not consider it bullet proof.
Anyway, after looking at the code it seems like it filters the log counts matrix before doing any computation. Is this doing the same thing as just filtering the counts matrix before running the propr command? I assumed select was doing something described in the propr paper:
We begin by constructing the proportionality matrix using all 57,580 transcript counts, yielding an N2 matrix 24.7 Gb in size. To minimize the number of lowly expressed transcripts included in the final result, we subset the matrix to include only those transcripts with at least 10 counts in at least 10 samples. By removing the features at this stage, we can exploit a computational trick to calculate proportionality and filter simultaneously, reducing the required RAM to only 5 Gb without altering the resultant matrix. Next, in the absence of a hypothesis testing framework, we arbitrarily select those “highly proportional” transcripts with ρp > 0.95. We refer the reader to the supplementary vignette for a justification of this cutoff (S1 Appendix). When plotting the pairwise log-ratio transformed abundances for these “highly proportional” transcript pairs, a smear of straight diagonal lines confirms that the feature pairs indexed as proportional actually show proportional abundance
Particularly the bolded/italicized section. I would guess removing the genes before calculating proportions will change the results compared to if we remove them after. Can you offer any insight to this? And do you have any suggestions for speeding up the computation as mentioned in the paper, here?
Thanks!
Hello, thanks for the great tool. I am using it on a set of RNA-Seq data with ~55k genes (features). I would like to use the select argument to dynamically filter the features to reduce time/resources. Here is my command and an example of my data:
I am getting this error:
I think it is expecting select to be a single element of length 1. Which is true if it is NA. But the if statement is not vectorized so it won't work with the added argument. The docs specify select should be "A numeric vector representing the indices of features to be used for computing the Propr matrix. This argument is optional. If provided, it reduces the data size by using only the selected features."
Is there a problem with the if statement or am I doing something wrong? Thanks so much!!