Closed simonleandergrimm closed 1 year ago
By returning, do you mean not adding them to estimate_*
, while still
keeping them in the pathogen.py
file? I'd rather not do that, as it might
be confusing.
Given our current set of pathogens, I can simply cut the HSV-1, and HSV-2
duplicate estimates, as those represent the raw NHANES data, while the CDC
estimates that are based on that data. I will instead mention the data in
the comments of the CDC-based prevalence estimates. In that case, I'd also
remove class Primary(Enum):
Does that sound good?
On Fri, 9 Jun 2023 at 14:18, Jeff Kaufman @.***> wrote:
@.**** requested changes on this pull request.
In pathogen_properties.py https://github.com/naobservatory/p2ra/pull/152#discussion_r1224614799:
@@ -39,6 +39,11 @@ class Active(Enum): LATENT = "Latent"
+class Primary(Enum):
What's the argument for adding a primary/secondary distinction instead of just only returning our best estimate for every location+timeperiod+taxid?
— Reply to this email directly, view it on GitHub https://github.com/naobservatory/p2ra/pull/152#pullrequestreview-1472854305, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN7ASMR2PYZM356SK4TDH4TXKNSIFANCNFSM6AAAAAAZA2PXOQ . You are receiving this because you authored the thread.Message ID: @.***>
@simonleandergrimm I think we should be generating a single best effort estimate, either by cutting low-quality estimates or combining multiple estimates. Your choice which!
I cut the estimates, ready for re-review!
We sometimes had multiple prevalence/incidence estimates for the same pathogen, time, and place. This can lead to errors in downstream analyses, and we instead want to either combine estimates, or drop estimates, with the goal of having only one estimate per time, place, and pathogen.
This PR adds a test, separately spotting duplicate prevalence and incidence estimates. It also drops duplicate estimates for HSV-1 and HSV-2, which we identified through this test.
Fixes #149.