Open apcamargo opened 6 years ago
Hi @apcamargo, did you use gene_mode = TRUE
when you ran sleuth_prep
? That's required to get gene-level aggregation of counts and TPMs, and it's required to mimic the previous behavior for sleuth versions <= 0.29.0.
If gene_mode = FALSE
(which is the default, even if you set the aggregation_column
), this method will only report transcript-level values. If gene_mode = TRUE
, you can only get gene-level values with norm_obs
. Let me know if this works or if there is indeed a bug.
It worked. Thanks Warren!
Is there a reason that the new p-value aggregation prevents sleuth_to_matrix
from returning gene-level abundance values? I understand that this method doesn't use gene-level expression for it's statistical analysis, but this behavior of not returning gene-level expression values is not user friendly. When a user is doing gene-level testing, I imagine the he will want gene-level abundances.
On a different note, I think it is easy to output the non-normalized TPM gene-level values in sleuth_to_matrix
. It's just a matter of multiplying the normalized values by the size factors. I imagine that sleuth currently only supports the normalized values because the between sample normalization is done before summing up transcripts abundance to obtain gene-level abundance. Is that right?
I'm not very knowledgeable in R, to say the least, but I can try to implement this if it's ok with you and Harold.
Hi @apcarmargo for your question. I can address some of the questions about p-value aggregation and our rationale going into this. The new way of gene-level testing is built on the idea that the gene significance should take into account transcript dynamics. Since the aggregated gene p-value is only based on the constitutive transcript p-values, we believe looking at overall gene expression would be misleading (there may be significant yet lowly expressed transcripts, or cancellation of effect sizes between transcripts). Instead, looking at the changes in each constitutive transcript’s expression would better explain the aggregated gene p-value.
Also thanks Warren for fielding all the questions!
Best, Lynn
On Jun 26, 2018, at 9:48 AM, apcamargo notifications@github.com wrote:
Hi @apcamargo, I agree that gene-level TPMs are easily returned, and maybe we could implement that.
In the meantime, I would lean heavily on @lynnyi's comments about the utility of looking at gene-level TPMs, whether normalized or not. You may consider running models gene-level aggregation of TPMs to see if there is separately detected change in gene-level abundances, but not sure if it's worth it. I would read the p-value aggregation paper for details, as well as Dr. Pachter's blog post on the topic.
We'll leave the issue open until we decide about implementing the gene-level aggregation in sleuth_to_matrix
Hi @lynnyi
I understand that the p-value aggregation strategy doesn't use gene-level abundances for the statistical analysis, but I think users may be confused that they're doing gene-level analysis and getting transcript-level abundances with sleuth_to_matrix
.
One possible solution would be adding a new parameter to sleuth_to_matrix
in which users would choose whether they want transcript or gene-level abundances, irrespective of the type of gene-level testing.
I've read your paper and fully understand the statistical taking of this strategy, but I observed that my coworkers have gotten very confused that they're not able to get their abundance values. I think this is a matter of user friendliness.
@apcamargo: So if a gene comes up as significant because it switched the isoform while keeping the overall gene level constant, you'd like to get a 'significant' gene level fold-change of 0? Wouldn't this confuse user even more?
IMHO sometimes it is worth actually dealing with the method at hand and trying to understand why it doesn't do things the way one is used to, also when not performing the analyses. I guess few of your collaborators asks you for delta-CT values after analyzing RNA-seq date even though this might be what they used to be used to. :wink:
That aside, offering an option to return gene level estimates probably wouldn't hurt anyone. I just doubt it would help the interpretation of the data.
That's obviously just my 2 cents. :smile:
@mschilli87 I agree with you that gene-level TPM values do not make sense in this kind of analysis, but I'm not certain that making it impossible for the user to get those values is a good idea. Maybe sleuth_to_matrix
should return transcript-level values by default and gene-level on user request, just like you said.
There was an additional user on the forum that wants a feature similar to what has been discussed here (link to the discussion).
The framework to implement this is available with the deprecated gene_summary
and read_per_base_transform
methods, but they would need to be reconfigured to work well for this use case.
I have two suggestions:
gene_mode
option to sleuth_to_matrix
that takes as the default the obj$gene_mode
. This would allow the function to output gene-level values as a default even for the raw values when gene_mode == TRUE
, but could be set to something different by the user.Given the time constraints, it's not clear if this will be implemented for the next release, but it will be prioritized for the subsequent release if it's not ready in time for this next release.
There was an additional user on the forum that wants a feature similar to what has been discussed here (link to the discussion).
The framework to implement this is available with the deprecated
gene_summary
andread_per_base_transform
methods, but they would need to be reconfigured to work well for this use case.I have two suggestions:
- We can implement a version where it is possible to get gene-level counts even from the raw transcript values, but it is best if we still use the "reads_per_base" transform because of the multiple issues with combining counts from transcripts of different lengths. Maybe there is space to allow for customization by the user, but for now, this will be the default behavior.
- For API, we can make a
gene_mode
option tosleuth_to_matrix
that takes as the default theobj$gene_mode
. This would allow the function to output gene-level values as a default even for the raw values whengene_mode == TRUE
, but could be set to something different by the user.Given the time constraints, it's not clear if this will be implemented for the next release, but it will be prioritized for the subsequent release if it's not ready in time for this next release.
Is there any update on if/when this new feature will be released?
I can't get gene-level expression values using
sleuth_to_matrix
, usig bothraw_obs
andnorm_obs
. The function returns a transcript-level matrix.I'm using version 0.30.0.