Closed Mr0grog closed 3 years ago
Updates:
id
to an int to make it more clearly meaningless. This bumps up the priority on #48.Other minor change: for clarity, rename the timestamps on availability
.
updated_at
→ valid_at
checked_at
→ requested_at
changed_at
for the last time the data in the row was different.High priority for the availability schema changes (useful for #52), everything else less so.
@Mr0grog which of these ideas, if any, still remain to be implemented?
Just the issues around is_public
and around address fields as an object, but I’m not sure either of those are really worth messing with at this point.
There are also two changes from this old doc: https://docs.google.com/document/d/1r9OHWal5oAUIsWDHiQAMQkOcGZUFEZCsTo7gxH34HlU/edit
provider_locations.eligibility
provider_locations.timezone
and ideally fill it in if not otherwise set from zip code (I've got a data file that maps these that should probably land in Postgres, but could also be in the source code).…but I think it’s better to make separate issues to track these; this issue is a confusing mess.
(See #154 & #155 for those last ones.)
There are a variety of things in how we’ve modeled the data that I’m not sure I’m happy with and would appreciate some extra thought or at least gut-checking from other people on:
Locations have an
id
that is externally determined (with the intent that it ought to be meaningful if possible). I’m not sure this was a good idea. It gave me a short-term way to cheat and skip #48, but it’s one reason #39 involves a whole lot more code than it probably should (I think I can improve it later today when I have time, but that doesn’t change this as a more fundamental issue).The availability table is a little complex: it’s unique by
(location_id, source)
, so the idea is basically that we store the latest of each source for each location. The goal here was twofold:But I think (2) is more complex than is really warranted or useful for most people (and I wound up making the API just return the latest for each, so the multiple sources are kind of invisible). This also leads to queries that feel unnecessarily complex to me: https://github.com/usdigitalresponse/appointment-availability-infra/blob/c0f5c99b9dec78b6e409d43e963c08e9a6044b31/server/src/db.ts#L148-L156
CovidWA handled this by having a normal table and a “beta” table, where the beta one was for things being tested/staged that shouldn’t yet be public. Would that be a better approach here?
External IDs as a JSONB object (e.g.
{ "vtrcks": "CV1001240", "cvs": 1034 }
). I think this still makes sense (although it needs an index for #48). Other approaches that are possible here:{ "vtrcks:CV1001240", "cvs:1034" }
) — this is probably slightly simpler/speedier in Postgres, but not sure.A way more complex structure with URIs. SMART Scheduling Links uses:
And Vaccinate the States has this for “links,” which is a similar-but-not-same use case:
Address components are meant to be an object in the output (they aren’t and that’s a bug), but they are independent columns in the
locations
table. Should they be a JSONB column?is_public
is a boolean on multiple tables that defaults totrue
. Wondering if this should be the opposite (is_private DEFAULT false
)