reiniervlinschoten / castoredc_api

Python Wrapper for Castor EDC API
MIT License
4 stars 5 forks source link

Support for grid field type when creating CastorDataPoint #93

Open ajfricke opened 10 months ago

ajfricke commented 10 months ago

Problem

Hi, I'm looking to use the castoredc_api package to pull my study's data from Castor but I realized the package doesn't handle the "grid" field type, such as the one shown below (real values blocked out): image

Because of this, CastorDataPoint.value is set to "Error". This happens in the __interpret function of castor_data_point.py:

def __interpret(self, study: "CastorStudy"):
    """Transform the raw value into analysable data."""
    if self.instance_of.field_type in ["checkbox", "dropdown", "radio"]:
        interpreted_value = self.__interpret_optiongroup(study)
    elif self.instance_of.field_type in [
        "numeric",
        "slider",
        "randomization",
    ]:
        interpreted_value = self.__interpret_numeric()
    elif self.instance_of.field_type in ["year"]:
        interpreted_value = self.__interpret_year()
    elif self.instance_of.field_type in [
        "string",
        "textarea",
        "upload",
        "calculation",
    ]:
        interpreted_value = self.raw_value
    elif self.instance_of.field_type in ["datetime"]:
        interpreted_value = self.__interpret_datetime(
            study.configuration["datetime"]
        )
    elif self.instance_of.field_type in ["date"]:
        interpreted_value = self.__interpret_date(study.configuration["date"])
    elif self.instance_of.field_type in ["time"]:
        interpreted_value = self.__interpret_time(study.configuration["time"])
    elif self.instance_of.field_type in ["numberdate"]:
        interpreted_value = self.__interpret_numberdate(study.configuration["date"])
    else: # i.e. "grid" field_type
        interpreted_value = "Error"
    return interpreted_value

Preliminary Investigation

I wanted to see if I could implement handling for the grid field type myself, however, I could not find an ideal way to extract the row and column headers of the grid object, which is important for interpreting the data; this data does not seem to be pulled when exporting the study data with the export endpoint (i.e., I can't find it within the data returned by self.client.export_study_data(archived=archived) in the __link_data function of castor_study.py).

I found a way to get this information with the single_study_field_record function of castoredc_api_client (which uses the data-point/study endpoint) like below:

import json
json.loads(study.client.single_study_field_record(record_id, grid_field_id)['_embedded']['field']['field_summary_template'])

which returns something like:

{'type': 'row',
 'rowsNumber': 5,
 'columnsNumber': 5,
 'rowNames': ['Category ',
  'Verbatim term for the disease suffered',
  'Controlled',
  'Onset Year ',
  'Ongoing'],
 'columnNames': ['Condition 1',
  'Condition 2',
  'Condition 3',
  'Condition 4',
  'Condition 5'],
 'fieldTypes': ['dropdown', 'string', 'dropdown', 'date', 'dropdown'],
 'optionLists': ['dummy_option_list_1',
  '',
  'dummy_option_list_2',
  '',
  'dummy_option_list_3']}

From which the rowNames and columnNames could be extracted to re-construct the grid. However, this solution doesn't seem ideal since it would require an API call for each grid type of each record, which could easily surpass the API rate limit.

Questions

I am wondering if there's a way to retrieve the row and column names for grid field types from the data returned by the export Castor endpoint. If so, this package could then be adjusted to handle grid field types. Or, alternatively, I am curious if there's a different way altogether to be able to interpret grid field types. Any help/guidance would be greatly appreciated. Thank you!

reiniervlinschoten commented 10 months ago

Thanks for the detailed bug report. Unfortunately I do not have time to tackle this at the moment, and it might be some time before I have. If you manage to figure it out, feel free to open a PR.

More info on the api can be found here: https://data.castoredc.com/api

Btw, when study data is exported through the api, the study structure is built first. That could be the place where you could add the grid information (see for example the CastorField object)

Op ma 15 jan 2024 21:01 schreef Alex Fricke @.***>:

Problem

Hi, I'm looking to use the castoredc_api package to pull my study's data from Castor but I realized the package doesn't handle the "grid" field type, such as the one shown below (real values blocked out): image.png (view on web) https://github.com/reiniervlinschoten/castoredc_api/assets/45519550/47b0644f-df44-4590-8be3-d64d3cbd0803

Because of this, CastorDataPoint.value is set to "Error". This happens in the __interpret function of castor_data_point.py:

def __interpret(self, study: "CastorStudy"): """Transform the raw value into analysable data.""" if self.instance_of.field_type in ["checkbox", "dropdown", "radio"]: interpreted_value = self.interpret_optiongroup(study) elif self.instance_of.field_type in [ "numeric", "slider", "randomization", ]: interpreted_value = self.interpret_numeric() elif self.instance_of.field_type in ["year"]: interpreted_value = self.interpret_year() elif self.instance_of.field_type in [ "string", "textarea", "upload", "calculation", ]: interpreted_value = self.raw_value elif self.instance_of.field_type in ["datetime"]: interpreted_value = self.interpret_datetime( study.configuration["datetime"] ) elif self.instance_of.field_type in ["date"]: interpreted_value = self.interpret_date(study.configuration["date"]) elif self.instance_of.field_type in ["time"]: interpreted_value = self.interpret_time(study.configuration["time"]) elif self.instance_of.field_type in ["numberdate"]: interpreted_value = self.__interpret_numberdate(study.configuration["date"]) else: # i.e. "grid" field_type interpreted_value = "Error" return interpreted_value

Preliminary Investigation

I wanted to see if I could implement handling for the grid field type myself, however, I could not find an ideal way to extract the row and column headers of the grid object, which is important for interpreting the data; this data does not seem to be pulled when exporting the study data with the export endpoint (i.e., I can't find it within the data returned by self.client.export_study_data(archived=archived) in the __link_data function of castor_study.py).

I found a way to get this information with the single_study_field_record function of castoredc_api_client (which uses the data-point/study endpoint) like below:

import json json.loads(study.client.single_study_field_record(record_id, grid_field_id)['_embedded']['field']['field_summary_template'])

which returns something like:

{'type': 'row', 'rowsNumber': 5, 'columnsNumber': 5, 'rowNames': ['Category ', 'Verbatim term for the disease suffered', 'Controlled', 'Onset Year ', 'Ongoing'], 'columnNames': ['Condition 1', 'Condition 2', 'Condition 3', 'Condition 4', 'Condition 5'], 'fieldTypes': ['dropdown', 'string', 'dropdown', 'date', 'dropdown'], 'optionLists': ['dummy_option_list_1', '', 'dummy_option_list_2', '', 'dummy_option_list_3']}

From which the rowNames and columnNames could be extracted to re-construct the grid. However, this solution doesn't seem ideal since it would require an API call for each grid type of each record, which could easily surpass the API rate limit. Questions

I am wondering if there's a way to retrieve the row and column names for grid field types from the data returned by the export Castor endpoint. If so, this package could then be adjusted to handle grid field types. Or, alternatively, I am curious if there's a different way altogether to be able to interpret grid field types. Any help/guidance would be greatly appreciated. Thank you!

— Reply to this email directly, view it on GitHub https://github.com/reiniervlinschoten/castoredc_api/issues/93, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSQFLJJEW6VKZBBJX2A65LYOWDIBAVCNFSM6AAAAABB3YDVRKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DENRRGE3DQMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ajfricke commented 10 months ago

Thank you @reiniervlinschoten! I was able to find the row and column names I needed and was able to modify the study structure to retrieve them as you suggested.

However, I ran into another problem I was hoping to get your advice on. I initially thought that a grid field could be a CastorDataPoint instance, but I ran into problems implementing an __interpret_grid function in castor_data_point.py since rather than a grid being an "instance of a field with a value" (as written in the docstring of CastorDataPoint), it's an instance of a field with multiple values (one per each cell of the grid/table).

Each cell of a grid has a field type (like "radio") and an option group ID (to map its raw value to its true value), so your __interpret functions in castor_data_point.py seem best suited to interpret the cells of the grid. However, the cells are not instances of a field (the grid is an instance of a field), so I don't think it makes sense to create a CastorDataPoint instance for each cell. So with that, I'm not too sure how to continue in order to make the module handle a grid.

Any help or guidance would be greatly appreciated. Thank you!

reiniervlinschoten commented 10 months ago

Maybe it is best to create a new castor_object, called CastorGrid, that can be part of a castor_form_instance like CastorDataPoint. Then the CastorGrid can hold the CastorDataPoint, and all the functions that get called on CastorGrid get called on each of the CastorDataPoints inside the grid?

Op vr 19 jan 2024 om 16:52 schreef Alex Fricke @.***>:

Thank you @reiniervlinschoten https://github.com/reiniervlinschoten! I was able to find the row and column names I needed and was able to modify the study structure to retrieve them as you suggested.

However, I ran into another problem I was hoping to get your advice on. I initially thought that a grid field could be a CastorDataPoint instance, but I ran into problems implementing an __interpret_grid function in castor_data_point.py since rather than a grid being an "instance of a field with a value" (as written in the docstring of CastorDataPoint), it's an instance of a field with multiple values (one per each cell of the grid/table).

Each cell of a grid has a field type (like "radio") and an option group ID (to map its raw value to its true value), so your __interpret functions in castor_data_point.py seem best suited to interpret the cells of the grid. However, the cells are not instances of a field (the grid is an instance of a field), so I don't think it makes sense to create a CastorDataPoint instance for each cell. So with that, I'm not too sure how to continue in order to make the module handle a grid.

Any help or guidance would be greatly appreciated. Thank you!

— Reply to this email directly, view it on GitHub https://github.com/reiniervlinschoten/castoredc_api/issues/93#issuecomment-1900667481, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSQFLO3D4OU55MDCLGPGDDYPKJCHAVCNFSM6AAAAABB3YDVRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBQGY3DONBYGE . You are receiving this because you were mentioned.Message ID: @.***>