xinehc / args_oap

ARGs-OAP: Online Analysis Pipeline for Antibiotic Resistance Genes Detection from Metagenomic Data Using an Integrated Structured ARG Database
MIT License
43 stars 11 forks source link

What does the unnormalized file actually contain? #28

Open alanxelena opened 1 year ago

alanxelena commented 1 year ago

Hi!

First of all, thanks for this tool! I've been using the normalized tables so far so I have never really paid attention to the unnormalized files but now I'm trying to get something that is not normalized. The numbers in these files confuse me a little, what do they mean? I guess they're not counts since I have values like 0.03 but I also assume they are not percentages since I also have values of 132.472. Any help would be really appreciated!!

xinehc commented 1 year ago

Hi,

unnormalized copies are computed for each gene using sum(aligned length/gene length). The results are further aggregated to three different levels so you see three different files in the output folder. Note that for short-read the values should be very similar to sum((subject start - subject end)/subject length) considering each gene as a subject.

unnormalized count are counts of reads aligned to each gene (aggregated using the same way as before).

HTH, Xi

alanxelena commented 1 year ago

Thanks! that answers a lot of things! I have another issue this time but I'll post it separately.

Thanks again!