sanger-pathogens / Bio-Tradis

A set of tools to analyse the output from TraDIS analyses
https://sanger-pathogens.github.io/Bio-Tradis/
Other
21 stars 29 forks source link

Essentiality function not working #129

Open sbastkowski opened 2 years ago

sbastkowski commented 2 years ago

Hi,

I have issues with the essentiality function not returning any essential genes due to not finding an intersection point. I attach the histogram of the sample and you can see that there are 2 peaks, but the first is not recognised. I had this issue with several samples. Could you suggest a solution for this? Regards and thanks for your help. Sarah

Screenshot 2021-12-09 at 12 09 42
lbarquist commented 2 years ago

Hi Sarah,

So, the first thing this script does is look for the first local minimum in the histogram, and uses this to heuristically separate the 'essential' (typically with a mode at 0) and non-essential (mode somewhere positive) distributions for curve fitting. In this case, your 'essential' distribution doesn't have a mode at 0, but is shifted right for some reason -- basically it means you're getting quite a few insertion sites called in what are supposed to be essential genes. The script is finding the local minimum at ~0 and trying to fit an exponential distribution to basically nothing, failing, and giving you nonsense as a result. My first thought is it might help to trim 3' and 5' ends, as the gene termini tend to tolerate insertions, though I'd be surprised if this explains the effect.

So, I've never seen this before in a lot of different datasets with different transposons and organisms, which leads me to think it's probably something with your data that's causing this. The possibilities I can think of are that there's an issue with the read mapping, and you're getting a lot of false insertion sites (this could be for instance an issue with soft-clipping), or you've sequenced stuff that's not just insertion sites but has some genomic DNA contamination. If you want to email me directly with some more details of the data you're working on, I'd be happy to discuss it more.

-Lars