sanger-pathogens / Bio-Tradis

A set of tools to analyse the output from TraDIS analyses
https://sanger-pathogens.github.io/Bio-Tradis/
Other
21 stars 29 forks source link

how is the insertion index calculated? #132

Open mpapange opened 1 year ago

lbarquist commented 1 year ago

It's the number of detected insertion sites divided by the gene length.

mpapange commented 1 year ago

The insertion indexes in my tradis_gene_insert table do not match the insert count/ gene length. Example:

ins_index gene_length ins_count ins count/ gene length 0.057711443 1116 58 0.051971326

lbarquist commented 1 year ago

Are you using 5' or 3' end gene trimming? It's only calculated over the untrimmed region. The calculation is on lines 123 - 140 of tradis_gene_insert_sites if you want to look at how it's being done.

mpapange commented 1 year ago

Do you mean on the fastq file? I use trim galore to remove reads contaminated with Illumina adapters on the fastq files. This should only work on the 3 end.

lbarquist commented 1 year ago

No, I mean the -trim3 or -trim5 arguments to tradis_gene_insert_sites. I suspect you have -trim3 set to 0.1 as in the tutorial. Assuming no insertion sites fall into the trimmed region, this will lead to the insertion index being calculated as:

58 / (1116 - 111) =~ 0.0577

which is what you have.

mpapange commented 1 year ago

Apologies! Yes, this is what I did. Thank you this clears things up.