CDC colleagues reported somewhat unexpected behavior for positive/negative calls in some surveillance results:
Here, you can see that this sample (D-18 in this run) was called positive for surveillance target Borrelia_sp but there are no names in the Borrelia_sp_other column, which would normally list the names of the Borrelia species that contributed to this positive call.
So there is a mismatch between the pos/neg Borrelia_sp call (Positive) and the Borrelia_sp_other names column (nothing listed).
Looking at the all_data tab of the output, you can see these abundances for sample D-18:
There are 2 Borrelia ref seqs that map to the Borrelia_sp and Borrelia_sp_other reporting columns: Bor_burgdorferi_CP017201 and Bor_SCGT_10_AF264895, which have 48 and 30 reads in this dataset.
Here is the part of targets.tsv (v107) that maps those refseqs to those reporting columns (some rows hidden):
The issue is that the sum of those targets is >50, which is the minimum count for making a positive call but neither is individually > 50, so their names do not show up in the the Borrelia_sp_other names column.
The way these columns are populated can be seen in these code snippets. First, for count type columns in the surveilance table:
And for name type columns:
So for named columns, the individual targets have to be called positive for a species name to appear. But for count-type columns the sum of the counts for targets that contribute to that column is used for pos/neg calls.
In our discussions, the idea was brought up to use the summed read counts for each species to decide whether to include a name or not in a name-type column.
Some notes about this:
These two targets are assigned to different species (Borrelia_burgdorferi and Borrelia_carolinensis) so even if read counts were summed at the species level, they wouldn't contribute to a species with > 50 reads.
Target Bor_burgdorferi_CP017201 is not mapped to the Borrelia_sp_other names column, so even if they were the same species, this name would not ever be reported in the Borrelia_sp_other column.
CDC colleagues reported somewhat unexpected behavior for positive/negative calls in some surveillance results:
Here, you can see that this sample (D-18 in this run) was called positive for surveillance target Borrelia_sp but there are no names in the Borrelia_sp_other column, which would normally list the names of the Borrelia species that contributed to this positive call.
So there is a mismatch between the pos/neg Borrelia_sp call (Positive) and the Borrelia_sp_other names column (nothing listed).
Looking at the all_data tab of the output, you can see these abundances for sample D-18:
There are 2 Borrelia ref seqs that map to the Borrelia_sp and Borrelia_sp_other reporting columns: Bor_burgdorferi_CP017201 and Bor_SCGT_10_AF264895, which have 48 and 30 reads in this dataset.
Here is the part of targets.tsv (v107) that maps those refseqs to those reporting columns (some rows hidden):
The issue is that the sum of those targets is >50, which is the minimum count for making a positive call but neither is individually > 50, so their names do not show up in the the Borrelia_sp_other names column.
The way these columns are populated can be seen in these code snippets. First, for count type columns in the surveilance table:
And for name type columns:
So for named columns, the individual targets have to be called positive for a species name to appear. But for count-type columns the sum of the counts for targets that contribute to that column is used for pos/neg calls.
In our discussions, the idea was brought up to use the summed read counts for each species to decide whether to include a name or not in a name-type column.
Some notes about this:
Need to discuss more with CDC folks.