Beta version of phenology features

Megachile commented 1 year ago

Here is the proposal:

API

A new API endpoint will be built.

https://www.gallformers.org/gall/{id}/pheno where {id} is some potential Gall ID.

Three operations will be supported:

GET: Fetches the pheno data for the given gall.

Will return 404 if there is no Gall with the given id.
Will return 200 and the pheno data for the passed in id in the body as the JSON structure defined below.

POST Adds or updates the pheno data for the given gall.

Will return 401 if the call is not properly authorized and authenticated. (Details TBD)
Will return 400 if there is no Gall with the given id
Will return 200 if the call is properly authorized, authenticated, the id is valid, and the passed in JSON body conforms to the structure defined below

DELETE Deletes the pheno data for the given gall.

Will return 401 if the call is not properly authorized and authenticated. (Details TBD)
Will return 400 if there is no Gall with the given id
Will return 200 if the call is properly authorized, authenticated, the id is valid, and the passed in JSON body conforms to the structure defined below

Under no circumstance will any of the API calls return 418

The JSON structure:

{
{
  "id": 0,
  "developing_start_slope": "-0.015",
  "developing_start_yint": "562",
  "developing_end_slope": "-0.015",
  "developing_end_yint": "900",
  "collection_viable_start_slope": "-0.015",
  "collection_viable_start_yint": "950",
  "collection_viable_end_slope": "-0.015",
  "collection_viable_end_yint": "950",
  "emergence_start_slope": "-0.015",
  "emergence_start_yint": "1100",
  "emergence_end_slope": "-0.015",
  "emergence_end_yint": "1222",
  "min_latitude": 0.0,
  "max_latitude": 0.0
}

All fields are JSON numbers. All must be valid numbers in proper JSON format. id, min_latitiude, and max_latitiude must never be null and must be passed in. All of the other values are optional, though unexpected things may happen if a value is passed for one of a pair but not the other.

User Facing UI

The initial POC implementation will be new page found at https://www.gallformers.org/pheno and linked from the front page and possibly from the nav bar (TBD).

The UI will show:

an input box for Latitude. This is constrained to latitudes in the range of 0 to 60
an input box for the Host Oak. This is the typical dropdown/typeahead component. Only a single Host can be chosen.

Once both of the above are filled with valid data a search will be run and a list of galls will be returned in a table with the following rows:

Gall - clickable will navigate to the main Gall page for that species
Viable Collection Start
Viable Collection End
Emergence Start
Emergence End

The table will be sortable on any of the rows. All rows will be displayed with no pagination.

Backend

The new data will be stored in a new pheno table.

CREATE TABLE gallpheno (
    id INTEGER NOT NULL,
    rearing_viable_start_slope REAL NOT NULL,
    rearing_viable_start_yint REAL NOT NULL,
    rearing_viable_end_slope REAL NOT NULL,
    rearing_viable_end_yint REAL NOT NULL,
    collection_viable_start_slope REAL NOT NULL,
    collection_viable_start_yint REAL NOT NULL,
    collection_viable_end_slope REAL NOT NULL,
    collection_viable_end_yint REAL NOT NULL,
    emergence_viable_start_slope REAL NOT NULL,
    emergence_viable_start_yint REAL NOT NULL,
    emergence_viable_end_slope REAL NOT NULL,
    emergence_viable_end_yint REAL NOT NULL,
    min_latitude REAL NOT NULL,
    max_latitude REAL NOT NULL,
    FOREIGN KEY (id) REFERENCES species (id) ON DELETE CASCADE
);

The algorithm to convert a latitude plus phenology event to a date is:

const date =  new Date(new Date().getFullYear(), 0, (latitude - yint) / slope);

Questions

[x] How to authenticate against the API from R - @jeffdc
[x] How should the new pheno page be accessed?
[x] What should the latitude constraints be in the UI?

Original Text

Not sure exactly what the best way to go about this would be so feel free to make countersuggestions (or just tell me we should wait for a spring launch) but since I do already have some data ready and can quickly get a fair amount more, I thought it might be cool to find a way to make it available for Gall Week and try to get some feedback.

Regardless of how we present the info to the user, this will necessarily involve creating some kind of import UI for me to add my data to GF.

Current thinking is that this should be in the form of slope/y-intercept parameters for doy-latitude lines, two sets each (start and end) for each variable of interest: rearing viability (we have very little data for this at this point but it's also the most useful where it is available; not a priority but flagging it here in case it's easier to add now), maturation, and developing.

It may also make sense for me to provide something like min-max latitudes in which these lines can be considered reasonable inferences, so we aren't implying people should look for galls way outside their ranges? (possibly longitudes as well if we can find some way to implement that with the state/province ranges already in the db?)

In terms of presenting this to the user, I think putting graphs on gall pages buries the relevant info a bit (though obviously a long-term goal). What I'm envisioning is either a stripped-down alternative to the ID page or an addition to it. The user enters their latitude and a host plant (trialing this exclusively on oaks for now) and we'll calculate the dates that correspond with that latitude for a list of galls meeting those criteria.

So eg I would enter Q virginiana, latitude = 30.2, and the site would return something along the lines of (fake dates for illustration):

jeffdc commented 1 year ago

First some thoughts on how we approach the source of truth for the data.

The help the discussion a few definitions:

Prod(uction) DB - the main gallformers DB that is supporting the live production site
A copy of the Prod DB - since the DB is Sqlite a copy is just a copy of the Prod DB file. There can be many of them and they will obviously not stay in sync once the copy is made. There is always a copy here in Github
Pheno DB - the DB that @Megachile is working on
SQL Script - a set of SQL statements that can be run against a DB. These can include statements that not only define the structure of the DB but also can add/modify/delete data in the DB

An import UI is a lot of work and not something that IMO we should invest in. Instead we need to do one of the following (in order of level of effort):

Pull the Pheno DB onto the Prod Server and just use it as-is. The main site would just run queries against it directly. A mechanism to upload an updated version of the Pheno DB would be provided. The Pheno DB would be read-only on the site
Build a SQL script that will grab all of the data from the Pheno DB and load it into the Prod DB. This is one-time event. Henceforth Pheno work would happen against a copy of the Prod DB. Two more SQL scripts would then be built that would be for export/import. These would pull the data from the copy of the Prod DB and then load it into the Prod DB
Build out a Pheno workflow on the main site

Option 1 is a bit simpler and allows for easier (and safer) expansion of the Pheno DB.

Option 2 means that the Pheno DB will stay more in sync in with the Prod DB (can still drift, but species info in particular would not be duplicated). However changes to the structure of the Pheno DB will require Production DB updates and we would need to make sure that the Pheno work never changes any data in any of the tables that are not Pheno specific (this is easy to enforce via the export/import scripts)

Option 3 is a tremendous amount of work but is the only way to prevent data drift.

jeffdc commented 1 year ago

DELETED and moved to the body of the issue.

Megachile commented 1 year ago

According to my plan, the data I'm submitting to the API should be a slope-intercept pair, not a raw date. We'll be calculating dates in the site itself using a given latitude as input (or just plotting the lines). So the input data should be more like:

{
  "id": 0,
  "rearing_viable_start_slope": "-.015",
  "rearing_viable_start_yint": "562",
  "rearing_viable_end_slope": "-.015",
  "rearing_viable_end_yint": "900",
 (etc)
  "min_latitude": 0.0,
  "max_latitude": 0.0
}

The pheno page can constrain latitudes between 20 and 60 if we don't care to extend to include Mexico/Central America right now. If we do (future-proofing?) then it would extend from 0-60.

Not sure what you mean about how to access the pheno page, other than putting a link to it on the main page?

jeffdc commented 1 year ago

Hmmm, this sounds a lot more complicated than I had anticipated. I do not currently understand how to go from those graph lines to lat/long.

Megachile commented 1 year ago

Eg:

lat = user entered latitude value

rearing_viable_start_doy = (lat - rearing_viable_start_yint) / rearing_viable_start_slope

Then use a function that converts doy to date with origin 01-01 of the current year, and that's what goes in the corresponding part of the table.

jeffdc commented 1 year ago

Ok that sounds easy enough.

Megachile commented 1 year ago

Yeah that is the easy part. It also lets us calculate the date ranges given a latitude and (eventually) create a plot from the same info with very little trouble. It's a bit brittle in other ways but that's a tradeoff that makes sense to me at the moment.

jeffdc commented 1 year ago

@Megachile I edited the main body of the ticket to include what I am going to build. Can you review it again please? I updated based on our comments here.

jeffdc / gallformers