Add support for multiple covariates

devilgate commented 5 years ago

Paul suggested either six or ten, as opposed to just the one we have at present (as well as age and sex). Though as soon as we go over one, why have a limit?

This will require at least the following:

Changes to the front end to allow selection of multiple covariates
Changes to the middleware to pass the covariates to the study at runtime
Changes to the R code to include the covariates in the calculations
Possibly changes to the database to hold them, though I think Peter said that was already there.

peterhambly commented 5 years ago

I propose to reduce risk we implement this in two stages:

One primary covariate, multiple additional covariates, No support for multiple covariates in the calculation or results (this reduce resource risk). Multiple covariates available in the extract.
Full multiple covariate support

devilgate commented 5 years ago

That makes sense.

devilgate commented 5 years ago

Support multiple covariates

peterhambly commented 5 years ago

Tasks, stage one (one primary covariate, multiple additional covariates):

Create table/view pair t_rif40_inv_additional_covariates/rif40_inv_additional_covariates;
Add support for additional covariates to the investigations screen and to the JSON study defintion in the front end;
Middleware support for additional covariates as a) objects, b) JSON study defintion c) database and d) extract reports;
Add support for additional covariates in Postgres and SQL Server study extraction SQL;
Confirm additional covariates appear in the extract and in the data viewer extract table;

Tasks, stage two (multiple covariates):

Add support for multiple covariates to the investigations screen and to the JSON study defintion in the front end;
Middleware support for multiple covariates as a) objects, b) JSON study defintion c) database and d) extract reports;
Add support for multiple covariates in Postgres and SQL Server study extraction SQL;
Add support for multiple covariates in the R code;
Confirm multiple covariates appear in the extract and in the data viewer extract table;

peterhambly commented 5 years ago

I have now done the database changes (in alter_12.sql). Instead of creating a new table I added covariate_type to rif40_inv_covariates/t_rif40_inv_covariates. This has values of 'N' (for normal covariates - the default) or 'A' (for additional). This will hopefully remove the need for changes to the extract code. Tested as back compatible on both PostgreSQL and SQL Server - i.e. you can still run a study with covariates.

devilgate commented 5 years ago

Nice work, Peter. Brandon has changed the R code so that it extracts multiple covariates, and I've changed the Java to pass multiple names to the R if they're there, so it's all coming together.

devilgate commented 5 years ago

Just looking at the code again, and AbstractCovariate has a covariateType property. It's of type CovariateType, which is an enum with the values CONTINUOUS_VARIABLE, BINARY_INTEGER_SCORE, and NTILE_INTEGER_SCORE.

That's going to clash with the covariate type you've just added to the database, so we might need to rename one of them. "Covariate kind"? "Covariate status"?
What does "ntile" mean in that third value?

But it seems like that property has no corresponding value in the database. It's worked out at runtime, in CovariateManager's getCovariates method. It just depends on the maximum and minimum values.

Do we even need it? There doesn't seem to be much in the way of functionality that depends on it.

peterhambly commented 5 years ago

This comes from the covariate definitions in rif40_covariates, as opposed to covariate_type in rif40_inv_covariates which is what you are working with: TYPE of covariate (1=integer score/2=continuous variable). Min < max max/min precison is appropriate to type. Continuous variables are not currently supported. Integer scores can be a binary variable 0/1 or an NTILE e.g. 1..5 for a quintile.

So it can be removed. rif40_inv_covariates.covariate_type of 'N' must be an integer score until we support quantiles in the extract.

peterhambly commented 5 years ago

Submit, save and multiple/additional covariate selection working OK; data being transferred to middleware, which is only processing the first covariate:

    "investigations": {"investigation": [{
      "years_per_interval": 1,
      "additionals": [{"additional_covariate": {
        "covariate_type": "CONTINUOUS_VARIABLE",
        "minimum_value": "358.0",
        "name": "NEAR_DIST",
        "description": "near distance covariate",
        "maximum_value": "78787.0"
      }}],

      ...

      "covariates": [
        {"adjustable_covariate": {
          "covariate_type": "INTEGER_SCORE",
          "minimum_value": "0.0",
          "name": "AREATRI1KM",
          "description": "area tri 1 km covariate",
          "maximum_value": "1.0"
        }},
        {"adjustable_covariate": {
          "covariate_type": "INTEGER_SCORE",
          "minimum_value": "1.0",
          "name": "SES",
          "description": "socio-economic status",
          "maximum_value": "5.0"
        }}
      ],

Table data:

1> select * from rif40.rif40_inv_covariates where study_id = 199;
2> go
username                                                                                   study_id    inv_id      covariate_name                 covariate_type min         max         geography                                          study_geolevel_name
------------------------------------------------------------------------------------------ ----------- ----------- ------------------------------ -------------- ----------- ----------- -------------------------------------------------- ------------------------------
peter                                                                                              199         183 SES                            N                    1.000       5.000 SAHSULAND                                          SAHSU_GRD_LEVEL4

(1 rows affected)

peterhambly commented 5 years ago

This is now OK for merging. I have fixed:

The Study summary (info button) now supports multiple covariates;

There are no additional covariates in the generated JSON. We can probably live with that:

"additionals": [{"additional_covariate": {
    "covariate_type": "CONTINUOUS_VARIABLE",
    "minimum_value": "358.0",
    "name": "NEAR_DIST",
    "description": "near distance covariate",
    "maximum_value": "78787.0"
  }}],

The extract map has a scaling problem; there is an error on the console: 11:30:19.763 [http-nio-8080-exec-3] WARN org.geotools.map.FeatureLayer org.geotools.map: Bounds crs not defined; assuming bounds from schema are correct for CollectionFeatureSource:org.geotools.feature.DefaultFeatureCollection@8663bc7
Both ports now work with the alter script set up correctly;
The totals and the covariate loss report is Ok for both ports;
I added intersect_count, distance_from_nearest_source, nearest_rifshapepolyid, exposure_value to rif40_study_areas view;

smallAreaHealthStatisticsUnit / rapidInquiryFacility

Add support for multiple covariates #124