mdsol / rwslib

Provide a (programmer) friendly client library to Rave Web Services (RWS).
MIT License
31 stars 13 forks source link

Production Clinical View #110

Closed Brian-Asahi closed 3 years ago

Brian-Asahi commented 3 years ago

Is it somehow possible to pull 'Production Clinical Views' when using biostats_gateway FormDataRequest?

FormDataRequest(project_name, environment_name, dataset_type, form_oid)

environment_name is limited to 'regular' and 'raw', so from my understanding this pulls 'Clinical Views' and 'Clinical Views Raw'. I would like to pull 'Production Clinical Views'. Is this possible?

iansparks commented 3 years ago

Hi @Brian-Asahi. The docs say that dataset_type can be regular or raw, not environment:

https://rwslib.readthedocs.io/en/latest/biostats_gateway.html#formdatarequest-project-name-environment-name-dataset-type-form-oid

Note that **dataset_type** can be ‘regular’ or ‘raw’. When called with a dataset type of “csv” a .csv is appended to the end of the form oid in the calling URL. When left off, XML will be returned.

so to pull from production environment (assuming its called "PROD" then):

>>> from rwslib import RWSConnection
>>> from rwslib.rws_requests.biostats_gateway import FormDataRequest
>>> r = RWSConnection('https://innovate.mdsol.com', 'username', 'password')  #Authorization required
>>> vital_csv_data = r.send_request(FormDataRequest('SIMPLESTUDY', 'PROD', 'REGULAR', 'VITAL', dataset_format="csv"))
>>>                                                                ^^^^^^  
>>> print(vital_csv_data)

That said, it's been a while since I have used this. When you talk about "production clinical views", there is an option in the core configuration (as I recall) to create separate Clinical Views for Production. If I recall correctly this just means that the definition of the clinical view (what columns it contains) is created only from Versions in production. Since there is a 144 column limit on Clinical Views this can be important (e.g. if you have a TEST environment for the same study and you've been adding a lot of columns that aren't in production you don't want your production data truncated just because it shares a clinical view definition with TEST). But having a separate "production clinical view" should not change how you access the CV data - you just give it the PROD environment name rather than the TEST or DEV or AUX and it works out which CV it should pull from.

Brian-Asahi commented 3 years ago

Hi @iansparks,

Thank you so much for the explanation, and for this incredibly useful library!

Apologies for conflating environment_name and dataset_type. I am indeed able to pull data from the 'PROD' environment and the 'raw/regular' dataset types, no problem. However the issue appears to be apart from these selections, and probably has more to do with what you are discussing in terms of core configurations on the database programming side.

In interacting with our database programmers, they have provided me the following background from medidata:

Rave generates four types of clinical views: Raw Views Raw Views present the user-entered form field values, regardless of whether the values conform to the fields' data types. Raw Views present values as character data. For example: if you submit “120” for a numeric systolic blood-pressure field for one Subject, and submit “ABC” for the same field for a second Subject, the Raw View would show both values, even though “ABC” is not a valid numeric value.· Regular Views Regular Views present the user-entered form field values as converted to the fields’ proper data types (conformant values). Regular Views also present the user-entered raw values of the Raw Views. For example: if you submit “120” for a numeric systolic blood-pressure field for one Subject, and submit “ABC” for the same field for a second Subject, the Regular View would show both values in its “raw” character column. The Regular View would also show the conformant value 120 in a separate numeric column for the field. For the Subject that had the nonconformant value of “ABC”, the numeric column would have a null value. Production Raw Views [Rave 5.6.3 and up] – The Production Raw Views only display data from Rave EDC production environments and include fields from CRF versions that are pushed to the production EDC environment. Production Regular Views [Rave 5.6.3 and up] – The Production Regular Views only display data from Rave EDC production environments and include fields from CRF versions that are pushed to the production EDC environment. These views are optional and can be configured to be generated or suppressed.   Q: Why are some of the fields that exist in a form do not appear in Clinical Views based reports and outputs such as, SAS On Demand report, Data Listing report, Standard Outputs, RWS outputs, or BOXI reports? A: Clinical Views use the last published CRF version as a template to create a structure of a view in a tabular format. If the last published CRF version does not contain the field, it is expected that the view does not contain the field as well. Regular Views If the fields are missing in the last published CRF version of the project, you need to publish (not push) the new CRF version (where the fields are present) from an existing CRF version. This should address the issue in regular views only. Production Views Production views by their design contain the form structure from CRF versions that been pushed to production environment. The last published and pushed CRF version is used for production views. To address the issue in production views, you should publish and push the new CRF version (where the fields are present) to a dummy site in production.

In my specific case, it appears that a field that is available in production has been dropped in a development version of a form, which has been published in the Dev environment. For that reason, the field is available to me only in Production Views and not in Regular Views. I have confirmed this by pulling the form via data listing using the two different View types from the front end.

I couldn't find anything in the docs regarding these options for the data request. If you have any ideas it would be incredibly helpful, but it sounds like it may be a database configuration issue?

Thanks again!

iansparks commented 3 years ago

Well, don't quote me @Brian-Asahi but what I believe happens is that if separate production views are set up for the Clinical Views (a global setting) then when you ask for data from "PROD" it comes from the production clinical views. Data from any other environment, AUX, TEST, DEV etc comes from the other versions. Here's that setting in an example URL. You can see in this instance it IS checked (build separate production views):

(Earlier version of this comment I said was not checked, I can see clearly it is, facepalm.)

image

I believe once you have separate production views, that's it, you can only get the production version of the views for PROD and you can't ask for PROD data in the context of the other views, it's no longer even (I think) available in those views.

If you want to access that field (which presumably doesn't have any data in any other TEST, AUX, DEV environment) in the views other than PROD you will need someone to publish a version containing that field to one of these environments - it should then at least turn up in the columns (might have to force regeneration of clinical views to make this happen).

But I am a bit confused as to why you want access to a column that only exists in non-PROD views? It's not being collected by the study so what is the value? (That's your business but just trying to understand the full scenario)

Brian-Asahi commented 3 years ago

Hi @iansparks,

Thanks for the quick response!

I think all of this context is coloring in my lack of understanding on the Database configuration side, so thank you.

Sorry for the confusion, but I think you are thinking I want the inverse of what I actually want:

The field that I am interested in pulling exists only in the PROD environment. It was deleted in some other environment (say DEV) by the database programmer. Because that form version with the field removed was published in DEV, it is missing from the Regular View. (According to my understanding, Regular Clinical Views only contain fields that exist in the latest published version of the form for the project, for all environments). Therefore the only way to get the field would be to pull the Production Clinical View.

What I know for sure is that pulling from the 'PROD' environment and either dataset_type using FormDataRequest from biostats_gateway, I am pulling the equivalent of the Regular Clinical View and not the Production Clinical View. I confirmed this by comparing the datasets with data listings pulled from each clinical view on the front end, through the data listings tool.

If you can't imagine why this would be the case from the point of view of this library, I suspect it must be on the database configuration side? It sounds like, ultimately, the FormDataRequest should only be pulling the Production Clinical View when the environment specified 'Prod'.

iansparks commented 3 years ago

Hi @Brian-Asahi, it could be the programming of the RWS endpoint itself. The learn.mdol.com page website doesn't make it any clearer than the rwslib docs and don't mention PROD vs Regular clinical views. They do have a tantalizing hint which may or may not help us:

They say these URLs are deprecated.

GET https://{host}/RaveWebServices/datasets/{clinical view name}.csv GET https://{host}/RaveWebServices/datasets/prod.{clinical view name}.csv

Deprecated but not removed?

We need to know the clinical view names. Lets find out:

from rwslib import RWSConnection
from rwslib.rws_requests.biostats_gateway import ProjectMetaDataRequest
rws = RWSConnection('innovate', 'MYNAME','MYPASSWORD')

# Get the view names and types. Are they different for the prod. ones?
study_csv_meta = rws.send_request(ProjectMetaDataRequest('MYSTUDY'))
print(study_csv_meta)

should give you a listing of clinical view names and their structure like this. One row per field in each view. We should see there are some views prefixed with "prod." these are the prod only views. If there aren't any then maybe not separate prod views?

projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel
"MYSTUDY","prod.V_MYSTUDY_Lab","1","userid","num","8","10.","Internal id for the user"
"MYSTUDY","prod.V_MYSTUDY_Lab","2","projectid","num","8","10.","projectid"
"MYSTUDY","prod.V_MYSTUDY_Lab","3","project","char","255","$255.","project"
"MYSTUDY","prod.V_MYSTUDY_Lab","4","studyid","num","8","10.","Internal id for the study"
"MYSTUDY","prod.V_MYSTUDY_Lab","5","environmentName","char","20","$20.","Environment"

So we know the view names. Not sure if we can get the data from them. We need a request formatted in that deprecated format:

GET https://{host}/RaveWebServices/datasets/prod.{clinical view name}.csv

from rwslib.rws_requests import RWSAuthorizedGetRequest

class ProdRequest(RWSAuthorizedGetRequest):
    """Deprecated https://{host}/RaveWebServices/datasets/prod.{clinical view name}.csv"""
    def __init__(self, study_name, view_name, prod=False, raw=False):
        self.study_name = study_name
        self.view_name = view_name
        self.prod = prod
        self.raw = raw

    def url_path(self):
        parts = []
        if self.prod:
            parts.append("prod.")
        parts.append(f"V_{self.study_name}_{self.view_name}")
        if self.raw:
            parts.append("_RAW")
        parts.append(".csv")

        return self.make_url("datasets", "".join(parts))

# VIEW_NAME without V_ or _RAW or prod.
test = rws.send_request(ProdRequest("MYSTUDY", "VITALS", prod=True, raw=False))
print(test)
print(rws.last_result.url) # <--- print the url it built, in case you want to paste into a browser to test

I got output like:

userid,projectid,project,studyid,environmentName,subjectId,StudySiteId,Subject,siteid,Site,.....
"39","3","MYSTUDY","6","Prod","492","12","1","9","Matthew Brams MD","000","World","2815",....
"39","3","MYSTUDY","6","Prod","495","12","TG_UAT_20181210","9","Matthew Brams MD","000","World","2844"...
EOF
https://innovate.mdsol.com/RaveWebServices/datasets/prod.V_MYSTUDY_VITALS.csv

So looks like to me that access to these prod. views is deprecated but not removed? Maybe it helps you - you might find that they changed the implementation and you're still getting the same data. I wasn't able to set things up so that I was sure that I had different fields/columns in Dev and Prod environments and I only had access to Prod data not test environments so I couldn't see a mix of data.

Maybe the listing of the views and their columns helps you to prove that the prod views do or do not exist and what their columns are?

Brian-Asahi commented 3 years ago

Hi @iansparks,

Aha, wonderful! Indeed the deprecated "prod." views were still available for my study, and the custom request for the deprecated format was successful. I was able to pull what appears to be knows as the Production Clinical View, I confirmed that the previously missing fields are now included in the new dataset pulled using the above request format.

Thanks again for your quick and knowledgeable assistance with this issue! This library has greatly increased the efficiency and effectiveness of a lot of our processes, I can't thank you enough!