Convert predictor calculation pseudo code to R

philipbaileynar commented 3 years ago

@David-Fowler the following pseudo code is the latest workflow for calculating predictors. Can you please convert it to R code? All the API endpoints mentioned already exist and should be working.

I just talked to @alherca73 and we discussed how to link this control script to his existing predictor calculation functions stored in this repo. We don't know how the "call_function" would work in R. Can R take the string name of a function from the database and execute it? Or do we need switch logic?

Note the TODO below where we are still discussing how to quality control predictor values. Other than that, this logic should be good to go!

# Loop over all samples that you want to process
for sample in project:

    # Retrieve the sample information (including sample date and site ID etc)
    sampleInfo = NAMCr::query('sampleInfo', sampleId = sample.sampleId)
    sample_date = sampleInfo.sampleDate

    # Retrieve the list of relevant predictors for this sample (both temporal and non-temporal)
    predictors = NAMCr::query('samplePredictorValues, sampleId=sample.sampleId)

    # Loop over each relevant predictor
    for predictor in predictors:

        # Only calculate predictors that are "missing" or "expired"
        if predictor.status != 'Current':

            # Retrieve the site location and catchment associated with this sample
            site = NAMCr::query('siteInfo', siteId = sample.siteId)
            site_point = site.location
            site_catch = site.catchment

            # Call the necessary r routine to calculate the predictor
            predictor_value = call_function(predictor.calculationScript, [site_point, site_catch])

            # TODO: predictor value quality control

            # Call the appropriate API endpoint to store the predictor value in the database
            # depending on if the predictor is temporal or not
            if predictor.isTemporal:
                NAMCr::query('setSamplePredictorValue', [
                    sampleId = sample.SampleId,
                    predictorId = predictor.predictorId,
                    predictorValue = predictor_value])
            else:
                NAMCr::query('setSitePredictorValue', [,
                    siteId = sample.siteId,
                    predictorId = predictor.predictorId
                    predictorValue = predictor_value])

alherca73 commented 3 years ago

Thanks @philipbaileynar that was VERY informative. @David-Fowler I recorded the meeting and is available on Box //Hernandez/ZoomRecordings//. More than likely you do not need more background but the rest of us do :)

alherca73 commented 3 years ago

@philipbaileynar , there is not a "samplePredictors" valid endpoint. I tried the endpoint "predictors" but this one does not take as an argument the sampleId. I also tried the "samplePredictorValues" endpoint successfully with the "sampleId" argument but it returns empty lists. Any thoughts?

philipbaileynar commented 3 years ago

The endpoint is samplePredictorValues

alherca73 commented 3 years ago

@philipbaileynar , I tried that endpoint but it returns empty lists. Maybe the AIM2020 watersheds that have been added to the database DONOT have predictors added yet?

philipbaileynar commented 3 years ago

That is correct. Very few samples and sites have predictors in this database.

I suggest you try adding some for 1 or 2 samples then try and retrieve back the values you just inserted.

alherca73 commented 3 years ago

@philipbaileynar, how can I find out what arguments each endpoint requires?

alherca73 commented 3 years ago

Nevermind, I got it. Can I use "setSamplePredictorValue" to add predictors?

philipbaileynar commented 3 years ago

Use setSamplePredictorValue to store temporal predictor values and associate them with a sample.

Use setSitePredictorValue to store non-temporal predictor values and associate them with a site.

philipbaileynar commented 3 years ago

@alherca73 I am extremely happy that you are experimenting with the new database! But I want to warn you that the database is currently ephemeral. I will be wiping it out periodically and reinstalling it.

So please continue developing scripts that send and retrieve data from the database. But please don't put any data in the database that you can't recreate!

I will send out a notification when I am rebuilding the database.

alherca73 commented 3 years ago

@philipbaileynar, thanks for the clarification. I am struggling how to enter new data into the database however. If I run :

NAMCr::cli(), and select option 18 (setSitePredictorValue) Apparently I am asked for: "siteID" which is easy, but I am also asked for "predictorId" and for "value". I assume that "predictorId" is a number? I have tried numbers and strings when I am asked for "value" but I keep getting "Invalid input".

Is there anywhere where documentation can be found?

David-Fowler commented 3 years ago

I will have the docs accessible via R studio tomorrow morning.

philipbaileynar commented 3 years ago

@alherca73 when you call the call to get all the predictor values for a sample:

NAMCr::query('samplePredictorValues, sampleId=sample.sampleId)

The data returned from this query should include the predictorId integer identifier. This is the value that you then provide back when you call setSitePredictorValue.

Alternatively you can call the predictors query to retrieve a list of all predictors in the system.

Does that help?

alherca73 commented 3 years ago

@philipbaileynar, thanks. I think that helps. I can see that the "calculationScript" in the "predictors" table is logical...

Would you please share the siteId information for those sites that already have predictors? You mentioned that there are just a few. I keep getting empty lists for the sites that I'm running - they are part of the AIM2020 set.

Thanks for your patience.

alherca73 commented 3 years ago

@David-Fowler Thanks, that will be very helpful!

philipbaileynar commented 3 years ago

Yes, the query that returns these data is very strict. It also filters out predictor values that are associated within inactive models (perhaps it shouldn't...).

sampleId 164756 should return you some site predictor values

alherca73 commented 3 years ago

@philipbaileynar, thanks it did.

The field/attribute "calculationScript" is defined as logical... what type of information goes into this column?

philipbaileynar commented 3 years ago

you can ignore calculationScript for now. It is currently "read only" through the API. Eventually we "could" put the names of your R functions for each predictor calculation method/function in this column. Then the R script could deduce which method to use for missing/expired predictor values.

alherca73 commented 3 years ago

@philipbaileynar, Somehow I got confused... and thought that this was going to be the workflow... to call the predictor functions that would be stored in that "calculationScript" column. I guess that is not set in stone yet? We can always discuss this more next week when T and J are back.

namc-utah / NAMCGeopredictors

Convert predictor calculation pseudo code to R #2