rnajena / bertax

Taxonomic classification of DNA sequences
GNU General Public License v3.0
50 stars 7 forks source link

Output clarification #11

Closed SorenHeidelbach closed 1 year ago

SorenHeidelbach commented 1 year ago

Hello, very nice tool! It ran smooothly.

I only have a clarification question about the output (during default running parameters). I get the headers [id, superkingdom, (%), ...] in bertax.tsv file, does the percentage refer to the percentage of chunks classified as the respective superkingdom or is it a certainty estimate? I used it to classify contigs, of which most were above 1500 nt, so most would be multiple chunks.

Best

f-kretschmer commented 1 year ago

Hello,

The percentage is a certainty estimate, taken from the softmax output layer(s). For multiple chunks, by default this estimate (and depending on this the predicted "best" class) is averaged. If you would like predictions and certainty estimates for each chunk individually, look at the option --chunk_predictions.

Best, Fleming

SorenHeidelbach commented 1 year ago

Super thank you!