jeffdc / gallformers

The gallformers site
https://www.gallformers.org
Apache License 2.0
15 stars 9 forks source link

Beta version of phenology features #356

Open Megachile opened 1 year ago

Megachile commented 1 year ago

Here is the proposal:

API

A new API endpoint will be built.

https://www.gallformers.org/gall/{id}/pheno where {id} is some potential Gall ID.

Three operations will be supported:

GET: Fetches the pheno data for the given gall.

POST Adds or updates the pheno data for the given gall.

DELETE Deletes the pheno data for the given gall.

Under no circumstance will any of the API calls return 418

The JSON structure:

{
{
  "id": 0,
  "developing_start_slope": "-0.015",
  "developing_start_yint": "562",
  "developing_end_slope": "-0.015",
  "developing_end_yint": "900",
  "collection_viable_start_slope": "-0.015",
  "collection_viable_start_yint": "950",
  "collection_viable_end_slope": "-0.015",
  "collection_viable_end_yint": "950",
  "emergence_start_slope": "-0.015",
  "emergence_start_yint": "1100",
  "emergence_end_slope": "-0.015",
  "emergence_end_yint": "1222",
  "min_latitude": 0.0,
  "max_latitude": 0.0
}

All fields are JSON numbers. All must be valid numbers in proper JSON format. id, min_latitiude, and max_latitiude must never be null and must be passed in. All of the other values are optional, though unexpected things may happen if a value is passed for one of a pair but not the other.

User Facing UI

The initial POC implementation will be new page found at https://www.gallformers.org/pheno and linked from the front page and possibly from the nav bar (TBD).

The UI will show:

Once both of the above are filled with valid data a search will be run and a list of galls will be returned in a table with the following rows:

The table will be sortable on any of the rows. All rows will be displayed with no pagination.

Backend

The new data will be stored in a new pheno table.

CREATE TABLE gallpheno (
    id INTEGER NOT NULL,
    rearing_viable_start_slope REAL NOT NULL,
    rearing_viable_start_yint REAL NOT NULL,
    rearing_viable_end_slope REAL NOT NULL,
    rearing_viable_end_yint REAL NOT NULL,
    collection_viable_start_slope REAL NOT NULL,
    collection_viable_start_yint REAL NOT NULL,
    collection_viable_end_slope REAL NOT NULL,
    collection_viable_end_yint REAL NOT NULL,
    emergence_viable_start_slope REAL NOT NULL,
    emergence_viable_start_yint REAL NOT NULL,
    emergence_viable_end_slope REAL NOT NULL,
    emergence_viable_end_yint REAL NOT NULL,
    min_latitude REAL NOT NULL,
    max_latitude REAL NOT NULL,
    FOREIGN KEY (id) REFERENCES species (id) ON DELETE CASCADE
);

The algorithm to convert a latitude plus phenology event to a date is:

const date =  new Date(new Date().getFullYear(), 0, (latitude - yint) / slope);

Questions

Original Text

Not sure exactly what the best way to go about this would be so feel free to make countersuggestions (or just tell me we should wait for a spring launch) but since I do already have some data ready and can quickly get a fair amount more, I thought it might be cool to find a way to make it available for Gall Week and try to get some feedback.

Regardless of how we present the info to the user, this will necessarily involve creating some kind of import UI for me to add my data to GF.

Current thinking is that this should be in the form of slope/y-intercept parameters for doy-latitude lines, two sets each (start and end) for each variable of interest: rearing viability (we have very little data for this at this point but it's also the most useful where it is available; not a priority but flagging it here in case it's easier to add now), maturation, and developing.

It may also make sense for me to provide something like min-max latitudes in which these lines can be considered reasonable inferences, so we aren't implying people should look for galls way outside their ranges? (possibly longitudes as well if we can find some way to implement that with the state/province ranges already in the db?)

In terms of presenting this to the user, I think putting graphs on gall pages buries the relevant info a bit (though obviously a long-term goal). What I'm envisioning is either a stripped-down alternative to the ID page or an addition to it. The user enters their latitude and a host plant (trialing this exclusively on oaks for now) and we'll calculate the dates that correspond with that latitude for a list of galls meeting those criteria.

So eg I would enter Q virginiana, latitude = 30.2, and the site would return something along the lines of (fake dates for illustration):

image

jeffdc commented 1 year ago

First some thoughts on how we approach the source of truth for the data.

The help the discussion a few definitions:

An import UI is a lot of work and not something that IMO we should invest in. Instead we need to do one of the following (in order of level of effort):

  1. Pull the Pheno DB onto the Prod Server and just use it as-is. The main site would just run queries against it directly. A mechanism to upload an updated version of the Pheno DB would be provided. The Pheno DB would be read-only on the site

  2. Build a SQL script that will grab all of the data from the Pheno DB and load it into the Prod DB. This is one-time event. Henceforth Pheno work would happen against a copy of the Prod DB. Two more SQL scripts would then be built that would be for export/import. These would pull the data from the copy of the Prod DB and then load it into the Prod DB

  3. Build out a Pheno workflow on the main site

Option 1 is a bit simpler and allows for easier (and safer) expansion of the Pheno DB.

Option 2 means that the Pheno DB will stay more in sync in with the Prod DB (can still drift, but species info in particular would not be duplicated). However changes to the structure of the Pheno DB will require Production DB updates and we would need to make sure that the Pheno work never changes any data in any of the tables that are not Pheno specific (this is easy to enforce via the export/import scripts)

Option 3 is a tremendous amount of work but is the only way to prevent data drift.

jeffdc commented 1 year ago

DELETED and moved to the body of the issue.

Megachile commented 1 year ago

According to my plan, the data I'm submitting to the API should be a slope-intercept pair, not a raw date. We'll be calculating dates in the site itself using a given latitude as input (or just plotting the lines). So the input data should be more like:

{
  "id": 0,
  "rearing_viable_start_slope": "-.015",
  "rearing_viable_start_yint": "562",
  "rearing_viable_end_slope": "-.015",
  "rearing_viable_end_yint": "900",
 (etc)
  "min_latitude": 0.0,
  "max_latitude": 0.0
}

The pheno page can constrain latitudes between 20 and 60 if we don't care to extend to include Mexico/Central America right now. If we do (future-proofing?) then it would extend from 0-60.

Not sure what you mean about how to access the pheno page, other than putting a link to it on the main page?

jeffdc commented 1 year ago

Hmmm, this sounds a lot more complicated than I had anticipated. I do not currently understand how to go from those graph lines to lat/long.

Megachile commented 1 year ago

Eg:

lat = user entered latitude value

rearing_viable_start_doy = (lat - rearing_viable_start_yint) / rearing_viable_start_slope

Then use a function that converts doy to date with origin 01-01 of the current year, and that's what goes in the corresponding part of the table.

jeffdc commented 1 year ago

Ok that sounds easy enough.

Megachile commented 1 year ago

Yeah that is the easy part. It also lets us calculate the date ranges given a latitude and (eventually) create a plot from the same info with very little trouble. It's a bit brittle in other ways but that's a tradeoff that makes sense to me at the moment.

jeffdc commented 1 year ago

@Megachile I edited the main body of the ticket to include what I am going to build. Can you review it again please? I updated based on our comments here.