smallAreaHealthStatisticsUnit / rapidInquiryFacility

The Rapid Inquiry Facility (RIF) helps epidemiologists and public health researchers in environmental health activities.
GNU Lesser General Public License v3.0
14 stars 5 forks source link

Add support for multiple covariates #124

Closed devilgate closed 5 years ago

devilgate commented 5 years ago

Paul suggested either six or ten, as opposed to just the one we have at present (as well as age and sex). Though as soon as we go over one, why have a limit?

This will require at least the following:

peterhambly commented 5 years ago

I propose to reduce risk we implement this in two stages:

devilgate commented 5 years ago

That makes sense.

devilgate commented 5 years ago

Support multiple covariates

peterhambly commented 5 years ago

Tasks, stage one (one primary covariate, multiple additional covariates):

Tasks, stage two (multiple covariates):

peterhambly commented 5 years ago

I have now done the database changes (in alter_12.sql). Instead of creating a new table I added covariate_type to rif40_inv_covariates/t_rif40_inv_covariates. This has values of 'N' (for normal covariates - the default) or 'A' (for additional). This will hopefully remove the need for changes to the extract code. Tested as back compatible on both PostgreSQL and SQL Server - i.e. you can still run a study with covariates.

devilgate commented 5 years ago

Nice work, Peter. Brandon has changed the R code so that it extracts multiple covariates, and I've changed the Java to pass multiple names to the R if they're there, so it's all coming together.

devilgate commented 5 years ago

Just looking at the code again, and AbstractCovariate has a covariateType property. It's of type CovariateType, which is an enum with the values CONTINUOUS_VARIABLE, BINARY_INTEGER_SCORE, and NTILE_INTEGER_SCORE.

  1. That's going to clash with the covariate type you've just added to the database, so we might need to rename one of them. "Covariate kind"? "Covariate status"?
  2. What does "ntile" mean in that third value?

But it seems like that property has no corresponding value in the database. It's worked out at runtime, in CovariateManager's getCovariates method. It just depends on the maximum and minimum values.

Do we even need it? There doesn't seem to be much in the way of functionality that depends on it.

peterhambly commented 5 years ago

This comes from the covariate definitions in rif40_covariates, as opposed to covariate_type in rif40_inv_covariates which is what you are working with: TYPE of covariate (1=integer score/2=continuous variable). Min < max max/min precison is appropriate to type. Continuous variables are not currently supported. Integer scores can be a binary variable 0/1 or an NTILE e.g. 1..5 for a quintile.

So it can be removed. rif40_inv_covariates.covariate_type of 'N' must be an integer score until we support quantiles in the extract.

peterhambly commented 5 years ago

Submit, save and multiple/additional covariate selection working OK; data being transferred to middleware, which is only processing the first covariate:

    "investigations": {"investigation": [{
      "years_per_interval": 1,
      "additionals": [{"additional_covariate": {
        "covariate_type": "CONTINUOUS_VARIABLE",
        "minimum_value": "358.0",
        "name": "NEAR_DIST",
        "description": "near distance covariate",
        "maximum_value": "78787.0"
      }}],

      ...

      "covariates": [
        {"adjustable_covariate": {
          "covariate_type": "INTEGER_SCORE",
          "minimum_value": "0.0",
          "name": "AREATRI1KM",
          "description": "area tri 1 km covariate",
          "maximum_value": "1.0"
        }},
        {"adjustable_covariate": {
          "covariate_type": "INTEGER_SCORE",
          "minimum_value": "1.0",
          "name": "SES",
          "description": "socio-economic status",
          "maximum_value": "5.0"
        }}
      ],

Table data:

1> select * from rif40.rif40_inv_covariates where study_id = 199;
2> go
username                                                                                   study_id    inv_id      covariate_name                 covariate_type min         max         geography                                          study_geolevel_name
------------------------------------------------------------------------------------------ ----------- ----------- ------------------------------ -------------- ----------- ----------- -------------------------------------------------- ------------------------------
peter                                                                                              199         183 SES                            N                    1.000       5.000 SAHSULAND                                          SAHSU_GRD_LEVEL4

(1 rows affected)
peterhambly commented 5 years ago

This is now OK for merging. I have fixed: