Open robertwhbaldwin opened 9 months ago
Hi @robertwhbaldwin
Are you using the json outputs?
Check out these publications for a description of the lineage system: https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-020-00817-3 https://www.nature.com/articles/ncomms5812
In a nutshell the samples can be classified as lineage 1-9. Within each of these you can assign a higher resolution level by adding extra digits. For example, a sample can be classified as L4 as the main lineage and within that lineage it can be further classified as sublineage L4.3.4.2. Each level of lineage designation have specific SNPs associated with them and this is how tb-profiler assigns lineage. In the example below the, tb-profiler finds SNPs specific to L4, L4.3, L4.3.4 and L4.3.4.2. So it condenses this down to reporting the Main lineage as L4 and the sublineage as L4.3.4.2.
"lineage": [
{
"lin": "lineage4",
"family": "Euro-American",
"spoligotype": "LAM;T;S;X;H",
"rd": "None",
"frac": 1
},
{
"lin": "lineage4.3",
"family": "Euro-American (LAM)",
"spoligotype": "mainly-LAM",
"rd": "None",
"frac": 1
},
{
"lin": "lineage4.3.4",
"family": "Euro-American (LAM)",
"spoligotype": "LAM",
"rd": "RD174",
"frac": 1
},
{
"lin": "lineage4.3.4.2",
"family": "Euro-American (LAM)",
"spoligotype": "LAM1;LAM4;LAM11",
"rd": "RD174",
"frac": 0.9984802431610942
}
],
"main_lin": "lineage4",
"sublin": "lineage4.3.4.2",
It is possible to have the main_lin and sub_lin fields empty if it can't resolve all the lineages it found. For example, if the pipeline found SNPs for L4.3.4.2 and not all of the levels before (e.g. L4, L4.3, L4.3.4) then it won't be able to resolve the lineages into main and sublin
Will someone please explain what the "main_lin", "sub_lin" and "lin" fields mean in the output ? Is it possible to have a case where the main_lin and sub_lin fields are empty but the lin fields are being reported? Thanks - Robert