How to get C-to-U site confidence score in Bullseye for single cell RNAseq data?

Hi Zongmin,

If I remember correctly, the nature method paper you're linked to used 2 scores, an epsilon score and the Sailor score. I am not sure which one you are referring to. I have not implemented their epsilon score in Bullseye and I would be unsure how to do so with the current pipeline.

As for the sailor score, I believe it represents the confidence that a detected site is a real one based on coverage, number of mutations. These parameters are already taken into account in Bullseye. However this score can be useful to rank sites, but you still need to set an arbitrary cutoff. It also does not account for a comparison between a control samples (APOBEC alone or YTHmut-APOBEC) and the DART sample.

There is a --score option in Find_edit_sites.pl to sort of get this score. When using it, it will calculate the confidence for sites in both DART sample and in control sample and then compare both as : log10(confidence in DART/confidence in Control). This would mean that anything above 1 is 10x more likely to be real in the DART samples than in the control. For single cell there may be large difference in coverage between each cell and the pseudobulk control sample, and I am unsure how this score would turnout.

Overall, when we try yo use it, we found that it did not improve the site selected and that the other parameters of Bullseye were able to replicate quite well filtering by score, with the advantage of being able to better know what we are changing. Of course in both cases we are setting arbitrary cutoffs so there may be biases.

Perhaps in the future I will look at the epsilon score,

Please let me know if that answer your question.

Best,

mflamand / Bullseye

How to get C-to-U site confidence score in Bullseye for single cell RNAseq data? #14