vagarwal87 / saluki_paper

Saluki, a method to predict mRNA half-lives from sequence
Apache License 2.0
22 stars 2 forks source link

saluki_predict_fasta.py half-life output #5

Closed QuantumAlmonds closed 1 year ago

QuantumAlmonds commented 1 year ago

Hi, I see that the output for a single sequence with this workflow is a matrix with 50 half-lives. In the paper you state that half-lives in your training data are negated if they are calculated from degradation rates. Can you help me understand the output of the values in this matrix? I understand why there are 50 of them, but what I do not understand is the following:

Thanks!

vagarwal87 commented 1 year ago

1) half-lives are only in arbitrary relative units....some studies published in minutes, other hours, etc. etc., and most do not have reliable absolute half-lives anyways; overall I think it's more appropriate to think of these as relative half-lives which can be rescaled to whatever dataset you trust most that has an absolute scale.

2) yes, https://github.com/vagarwal87/saluki_paper/blob/main/Fig1_S1_S2/master_comparison_human.R this script ingests all the data and documents all of the individual samples that were log10 transformed

3) because after taking the log, we sometimes z-score normalize the data, so it can be thought of as half-life to the mean

some folks calculate half-lives as degradation rates depending on the equation they use (inverse of the constant prior to log transformation), hence why a small minority of samples were negated after taking log.