hivdb / chiro-frontend

Coronavirus frontend, CoV-Rx-DB,
0 stars 1 forks source link

Geomean error, geomean of data from a group of papers. #57

Closed KaimingTao closed 2 years ago

KaimingTao commented 2 years ago
Screen Shot 2022-02-04 at 17 52 52

Please see the second row, the Geomean is "<85". When using the individual data, the geomean is about "<12".

philiptzou commented 2 years ago

@KaimingTao What filter condition did you use? And what do you mean "using the individual data"?

KaimingTao commented 2 years ago

1) condition: AZD1222 and Omicron 2) the individual data Excel spreadsheet from "Data Availability"

philiptzou commented 2 years ago

The website’s geomean is weighted. You need to take the # sample into account. Excel doesn’t have weighted geomean.

KaimingTao commented 2 years ago

I've downloaded the file and recalculated the geomean value.

URL: https://covdb.stanford.edu/search-drdb/?dosage=2&dosage=3&host=human&vaccine=AZD1222&variant=Omicron%2FBA.1

datafile: datasheet_1644362828027.xlsx

Screenshot:

Screen Shot 2022-02-09 at 15 08 04

I tried different ways in the Excel, all get geomean = 5.7. The figure is showing <100.

philiptzou commented 2 years ago

The 5.7 is changed to <100 is because of: https://github.com/hivdb/chiro-frontend/blob/80011fcf04d6f1a17bf6098fbcb2b3eb28d8a7bb/src/views/search-drdb/tables/column-defs/agg-funcs.js#L204-L209

When there are non-neutralized results, the algorithm currently chooses a "representative" ineffective titer (currently it's the highest ineffective titer, i.e. the highest lower limit, of NT50; or the lowest upper limit for IC50). If the actual geomean titer is lower than that number, the algorithm replaces the actual geomean titer with the "representative" one and mark the overall result as "ineffective".

The original reason for having this behavior, is to properly determine when to call a geomean titer "ineffective". Obversely we don't want to call ineffective when 90 of 100 individual titers are effective; and we also don't want to call effective when 90 of 100 are ineffective.

However, the case you came up with is very extreme. It has two third (85) of the 126 results are ineffective, and two of them are at 100 solely thanked to Liu, L (2021l). All other ineffective titers are less than or equal to 10. For this case, the "representative" ineffective titer is not very representable. To address this issue, I'm going to change to use geomean to determine the "representative" of ineffective titers.