xinehc / args_oap

ARGs-OAP: Online Analysis Pipeline for Antibiotic Resistance Genes Detection from Metagenomic Data Using an Integrated Structured ARG Database
MIT License
41 stars 11 forks source link

Questions about structure files of customized databases #68

Open rileyjiang opened 3 months ago

rileyjiang commented 3 months ago

Hi, I have a question about the structure files of customized databases. According to the Readme, the first column of the structure file should be sequences ID. Does this need to be unique? How can we deal with the situation that a sequence has several different types? For example, should I construct the structure file like:

level1 level2 level3 seq1 subtype1 type1(Ni) seq1 subtype1 type2(Co)

or

level1 level2 level3 seq1 subtype1 type1(Ni),type2(Co)

Looking forward to your reply, Thank you!

xinehc commented 3 months ago

level1 need to be unique, so two seq1 is not allowed. Your second construction seems fine.

rileyjiang commented 2 months ago

Thank you very much for your quick reply! Another question is that what's the difference between '--structure1(single component)', '--structure2(two-component)' and '--structure3 (multi-component)'. The help page does not detail this.

xinehc commented 2 months ago

--structure2 is for two-component systems so each component is weighted by 0.5, --structure3 by 1/3.

rileyjiang commented 2 months ago

What do you mean by two-component systems? For example, I cannot see the difference between 'two-component_structure.txt' and 'multi-component_structure.txt' for the default database.

two-component: 截屏2024-08-13 14 25 38 multi-component: 截屏2024-08-13 14 28 57

Does it refer to the situation that a gene has two types or subtypes? And what's the influence on the result by using --structure2/--structure3?

xinehc commented 2 months ago

All genes listed in the two-component.txt file will be weighted by a factor of 0.5. The structure of the three files (single, two, multi) is identical, the only difference it the weight (1, 1/2, 1/3) applied when calculating the abundance.

rileyjiang commented 2 months ago

Thanks, that's clear!