pachterlab / sleuth

Differential analysis of RNA-Seq
http://pachterlab.github.io/sleuth
GNU General Public License v3.0
304 stars 95 forks source link

Wrong TPM values in 'transcript view' (sleuth_live) #130

Open apcamargo opened 7 years ago

apcamargo commented 7 years ago

I'm analyzing a transcript (TCONS_00028547) that's differentially expressed between two conditions. I've noticed that the TPM values that appear in sleuth live are different from the ones I get from sleuth_to_matrix (both for raw and normalized TPM values).

Values of normalized TPM from sleuth_to_matrix:

|                | X     | C    | E    | A    | Y     | D    | F    | B    | Z     |
|----------------|-------|------|------|------|-------|------|------|------|-------|
| TCONS_00028547 | 24.67 | 8.84 | 9.28 | 8.84 | 23.18 | 9.10 | 8.70 | 8.98 | 26.22 |

Screenshot of sleuth_live: screenshot from 2017-08-15 14-29-03

Is this behaviour normal?

warrenmcg commented 7 years ago

Hi @apcamargo,

The plot from sleuth_live is plotting using the plot_bootstrap method, which in turns get the summary data from bs_quants using get_bootstrap_summary.

What do you get when you run the following line of code? (assuming your sleuth object is named so)

bs_summaries <- sleuth::get_bootstrap_summary(so, "TCONS_00028547", units = "tpm")
bs_summaries

If this does not match the figure, we have a separate problem. If it does match the figure, we have a weird situation where the raw data indicates one set of TPMs, but the bootstraps indicate a very different set of TPMs.

warrenmcg commented 7 years ago

pinging @apcamargo, just wanted to follow-up: did you see if the code I gave you match the figures? We are interested in trying to resolve this issue for you.

pimentel commented 7 years ago

ping @apcamargo

apcamargo commented 7 years ago

Sorry for the late follow-up. The values that in get with sleuth::get_bootstrap_summary match the ones in the figure and are different from the ones I get with sleuth_to_matrix

warrenmcg commented 7 years ago

Hi @apcamargo, thanks for the follow-up.

Do you see the same discrepancy with the estimated counts? That could help narrow down what might be causing this issue.

Would you be willing to share your kallisto/salmon results with us offline so we can dig into the data more deeply, and get to the bottom of this discrepancy? From what you're telling me, at this point this does not appear to me to be a sleuth issue, but some weird edge case where kallisto/salmon is not behaving as expected.

pimentel commented 7 years ago

Indeed. This would be much easier to debug if we have the data and code. If you can, please email us at:

hjp at stanford.edu warren-mcgee at fsm.northwestern.edu

(replace at with @ and remove spaces)

Thanks!

apcamargo commented 7 years ago

The estimated counts values seem to be correct.

What should I send you? A image of my session or just the sleuth object?

pimentel commented 7 years ago

You can simply send us the sleuth object in RDS mode:

saveRDS(so, file = 'sleuth_object.rds')

thanks

rschulzUK commented 6 years ago

I stumbled across this recently too. Looks to me like plot_bootstrap, via get_bootstrap_summary plots the un-normalised median (mid)/upper quartile/lower quartile/min/max of the bootstrap-generated TPM values. When I divide the median values by the scaling factors tpm_sf of the sleuth object, the results are close to the normalised means in obs_norm$tpm. So perhaps plot_bootstrap just needs a which_df argument like sleuth_to_matrix so that one can generate consistent plots? Something like: if which_df=='obs_norm', then the values returned by get_bootstrap_summary are divided by the respective scaling factors (either tpm_sf or est_counts_sf) before being plotted.

ys-lim commented 3 years ago

Hello,

I am trying to use plot_bootstrap for a novel transcript. I am getting a similar issue in that the tpm counts are correct, but the est_counts plot seems to be giving me different values than the sleuth matrix.

image

As you can see, the est_counts values range from 0 to 5 only, but my actual sleuth matrix shows: control1 control2 control3 treatment1 treatment2 treatment3
STRG.1313.8 4.027404e+01 0.0000000 0.000000e+00 8.039003e+02 479.3533425 635.9643100

I've also done a quick check of my bootstraps abundance files, and the estimated counts seem to be reasonable, and so I am unsure where the problem lies. I appreciate any help or comments regarding this!

Many thanks, YS

kaylahardwick commented 1 year ago

Hello,

Has progress been made on this issue? I am seeing similar results where the TPM values returned by sleuth_to_matrix are in line with those returned by get_bootstrap_summary, but the est_counts from the bootstrapping vs sleuth_to_matrix are extremely different. I'm worried this will effect differential expression analysis if the bootstraps on estimated counts are inaccurate. Please let me know if you have any thoughts on this.

Thanks! Kayla