Open alimanfoo opened 2 years ago
Some notes:
Here ds
is an input param, an xarray.Dataset of variant frequencies:
cohort_vars = [v for v in ds if v.startswith("cohort_")]
df_cohorts = ds[cohort_vars].to_dataframe()
DataFrames for each cohort are concatenated:
dfs = []
for cohort_index, cohort in enumerate(df_cohorts.itertuples()):
ds_cohort = ds.isel(cohorts=cohort_index)
df = pd.DataFrame(
{
"taxon": cohort.taxon,
"area": cohort.area,
"date": cohort.period_start,
"period": str(
cohort.period
), # use string representation for hover label
"sample_size": cohort.size,
"variant": variant_labels,
"count": ds_cohort["event_count"].values,
"nobs": ds_cohort["event_nobs"].values,
"frequency": ds_cohort["event_frequency"].values,
"frequency_ci_low": ds_cohort["event_frequency_ci_low"].values,
"frequency_ci_upp": ds_cohort["event_frequency_ci_upp"].values,
}
)
dfs.append(df)
df_events = pd.concat(dfs, axis=0).reset_index(drop=True)
A query is applied to remove events with no observations:
df_events = df_events.query("nobs > 0")
I suppose we could exclude cohorts that don't match the specified taxon
or area
parameters from that concatenation, but it looks like that would still require us to compute the frequencies.
We need to make sure we use the appropriate data in calculations, e.g.:
frq = df_events["frequency"]
frq_ci_low = df_events["frequency_ci_low"]
frq_ci_upp = df_events["frequency_ci_upp"]
df_events["frequency_error"] = frq_ci_upp - frq
df_events["frequency_error_minus"] = frq - frq_ci_low
But maybe the idea is to simply toggle different taxon and areas just on the plot itself, somehow, without any refreshing or recomputing? (I don't understand yet.)
Allow to control which taxa and areas to show, without having to recompute the frequencies.