tpq / propr

[OFFICIAL] [CoDA] An R package to calculate proportionality and other measures for compositional data
69 stars 10 forks source link

Question about implementing ALR with ERCC92 spike ins #34

Open jolespin opened 2 years ago

jolespin commented 2 years ago

I finally got my hands on a dataset with properly designed ERCC92 spike ins. The question is, how should I use these with ALR in theory?

The additive log-ratio transformation (alr), which allows the user to scale their data by a feature with an a priori known fixed abundance, such as a house-keeping gene or an experimentally fixed variable (e.g., a ThermoFisher ERCC synthetic RNA “spike-in”15), may provide a superior alternative. In contrast to clr, proportionality calculated with alr does not change with missing feature data because it effectively back-calculates the absolute feature abundance.

https://www.nature.com/articles/s41598-017-16520-0

Do I use a single ERCC92 feature as the reference, the summation, or the mean?

Do I include all or only a select few if it's the latter 2 options?

Should I scale all the datasets so their ERCC92 spike counts are the same before transformation? (This will likely result in the same data, though I'm thinking out loud and haven't tested)