API query usability questions and issues

kairstenfay commented 5 years ago

Hey all,

Thanks for the work you've done so far in creating/hosting the modelling results API + the swagger interface for us to use.

Now that we're finally coming around to using it, I have a few questions/ideas I thought you may be able to help with.

CORS issue (mentioned in #86)
Language of pathogens. These do not currently align with our target/organism definitions. Where can I see a list of all possible pathogens? Examples
- I can match your Flu_A_H1 with our target name pretty easily, but other names are harder to interpret/match with your dataset. For example, I don't see any organism-level pathogen for Flu B/Vic or Flu B/Yam, just Flu_B_pan.
- Does the all pathogen refer to any pathogen? Or does it refer to all samples including negative, untested, and unknown?
modeled_intensity_mean is not a result in every model I found. Ideally we would be able to have mean, mode, sd, and 95% CI for every query.
Can we get results for age outside of inla_observed models?
Can we get results for sex?
Can we get results for vaccination status?

For reference, I am currently formatting my queries as:

    {
      model_type: "inla_latent",
      observed: ["encountered_week", "residence_neighborhood_district_name"],
      pathogen: [pathogen],
      spatial_domain: "seattle_geojson_neighborhood_district_name"
    };

where pathogen is an interchangeable variable I set up on our own, local GET endpoint.

kairstenfay commented 5 years ago

Connect IDM model output to viz prototype

famulare commented 5 years ago

Thanks for useful questions. Some I can answer now, some need more attention.

2) The targets are coming directly from this database view shipping.presence_absence_result_v1, which reflects the taqman array. That only has Flu_B_pan for example. If/when you derive more specific strains from sequencing, I'll need a database view with that info to merge with the encounter database. Also, all refers to all samples, including negatives.

3) I can provide standardized list. In prototyping, mode is better behaved mathematically with smaller samples, and so please build around mode for the time being.

4-6) Yes, eventually, depending on how many layers deep we want to go. Right now, I'm just building some subsets because of memory limitations. Addressing this fits "before end of September" timeline for us, but I'll be able to think about it more next week in prep for Ad Board meeting, and so will get back to you with more useful comments then.

kairstenfay commented 5 years ago

Thanks for the quick response, Mike. I did not realize that the targets are coming directly from shipping.presence_absence_result_v1. That is useful to know. I look forward to future discussions to elucidate how the moving pieces work together between ID3C (the Bedford lab) & IDM.

I don't think there's any rush in expanding the API. Brainstorming API design some time between the Ad Board meeting and the end of September should work for our timeline. I'll reach out if anything changes.

tsibley commented 5 years ago

Re: target vs. organism: we only relatively recently mapped targets to organisms in ID3C. This is important as targets change and vary but detect the same pathogen. It also helps us roll up targets by a less-specific organism classification. So, at some point we should be shipping organism lineage in the modeling observation views instead of targets, and then the models can use them. List of those lineages is here: https://backoffice.seattleflu.org/metabase/question/292

seattleflu / incidence-mapper

API query usability questions and issues #87