Add taxon and area parameters to plot_frequencies_time_series()

Some notes:

Here ds is an input param, an xarray.Dataset of variant frequencies:

cohort_vars = [v for v in ds if v.startswith("cohort_")]
df_cohorts = ds[cohort_vars].to_dataframe()

DataFrames for each cohort are concatenated:

dfs = []
  for cohort_index, cohort in enumerate(df_cohorts.itertuples()):
      ds_cohort = ds.isel(cohorts=cohort_index)
      df = pd.DataFrame(
          {
              "taxon": cohort.taxon,
              "area": cohort.area,
              "date": cohort.period_start,
              "period": str(
                  cohort.period
              ),  # use string representation for hover label
              "sample_size": cohort.size,
              "variant": variant_labels,
              "count": ds_cohort["event_count"].values,
              "nobs": ds_cohort["event_nobs"].values,
              "frequency": ds_cohort["event_frequency"].values,
              "frequency_ci_low": ds_cohort["event_frequency_ci_low"].values,
              "frequency_ci_upp": ds_cohort["event_frequency_ci_upp"].values,
          }
      )
      dfs.append(df)
  df_events = pd.concat(dfs, axis=0).reset_index(drop=True)

A query is applied to remove events with no observations:

df_events = df_events.query("nobs > 0")

I suppose we could exclude cohorts that don't match the specified taxon or area parameters from that concatenation, but it looks like that would still require us to compute the frequencies.

We need to make sure we use the appropriate data in calculations, e.g.:

frq = df_events["frequency"]
frq_ci_low = df_events["frequency_ci_low"]
frq_ci_upp = df_events["frequency_ci_upp"]
df_events["frequency_error"] = frq_ci_upp - frq
df_events["frequency_error_minus"] = frq - frq_ci_low

But maybe the idea is to simply toggle different taxon and areas just on the plot itself, somehow, without any refreshing or recomputing? (I don't understand yet.)

malariagen / malariagen-data-python

Add taxon and area parameters to plot_frequencies_time_series() #182