t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
36 stars 22 forks source link

Proposal for a new Alleyoop: convenient summary of `dump` output #153

Open isaacvock opened 2 months ago

isaacvock commented 2 months ago

Hello Tobias,

Summary

I wanted to propose a new Alleyoop that I would eventually incorporate into a pull request, if you approve of the addition. The idea is for it to take the output of alleyoop dump and produce a summarized table similar to the "cB file" described here.

If you are open to this proposal, I will start crafting a pull request. Not sure how long it will take me, but I wanted to get your feedback before beginning work on it.

Motivation

The reason for this proposal is that currently, users of SLAMDUNK are encouraged to treat the number of reads with 1 or more T-to-C conversions as an estimate for the number of new reads. This is suboptimal for the following reasons:

  1. Reads from old RNA can have mutations due to RT/sequencing or alignment errors
  2. Reads from new RNA can have no mutations due to s4U having to compete with regular uridine for incorporation
  3. The s4U incorporation rate, and thus the probability that a given read has >= 1 mutation, is often a function of the biological condition. For example, its often the case that a particular genetic KO has a different incorporation rate from WT cells, leading to a confounder that can mislead downstream analyses.

While the UTR-specific total conversion rates provided by SLAMDUNK can be used to improve the rigor of new read abundance estimation, it would be great if SLAMDUNK better supported more rigorous downstream mixture modeling, the gold-standard for analyzing SLAM-seq and similar datasets. One way to do that would be my proposal to make the read-specific mutational data provided by SLAMDUNK more convenient to work with in tools like bakR, which could perform the necessary mixture modeling. This would also better support users looking to develop novel mixture modeling strategies, like the ones described in this preprint.

Thanks for your consideration, Isaac

t-neumann commented 2 months ago

Hi Isaac,

absolutely, I was considering for a long time to also make SLAM-DUNK connectable with GRAND-SLAM and bakR but simply lacked the time and immediate need to get it up and running, but it would for sure be great for adopting SLAM-DUNK in combination with the other methods in the field.

So I would be more than grateful if you could take the lead and guiding / helping with the implementation and any questions along the way.

Best,

Tobi