statOmics / satuRn

satuRn is a highly performant and scalable method for performing differential transcript usage analyses.
https://statomics.github.io/satuRn/
20 stars 1 forks source link

[BUG] Usages of genes with only one isoform #17

Closed jgilis closed 2 years ago

jgilis commented 2 years ago

It does not make sense to compute usages for isoforms that are the only isoform of a certain gene. This would result in a 100% usage in all groups of cells. However, due to the fact we add a pseudocount for stabilization, satuRn can still compute usages, and these can even be slightly different between groups (always close to 100%, but given the small standard error on these estimates, these could even be statistically significant).

Solution; return NA for such isoforms.

jgilis commented 2 years ago

satuRn now internally handles isoforms that are the only isoform of a certain gene. However, in stead of removing/filtering them as suggested in the original issue, they are retained in the count object, and results are set to NA. In addition, the model type (which can be accessed with rowData(sce)[["fitDTUModels"]][[tx]]) is set to lonelyTranscript, to make a distinction with transcript for which the GLM fitting procedure failed (which are of type fitError).