This is a Python package for interacting with the API of Castor Electronic Data Capture (EDC). The package contains functions to interact with all the endpoints defined on https://data.castoredc.com/api#/. Within the package are functions for easy export and import of your data through the API.
Supported export formats are
Import currently only supports .xlsx files with some configuration.
See for more information below.
pip install castoredc-api
conda install -c conda-forge castoredc_api
conda install -c reiniervl castoredc_api
For all implemented functions, see: https://data.castoredc.com/api#/
from castoredc_api import CastorClient
# Create a client with your credentials
c = CastorClient('MYCLIENTID',
'MYCLIENTSECRET',
'data.castoredc.com')
# Link the client to your study in the Castor EDC database
c.link_study('MYSTUDYID')
# Then you can interact with the API
# Get all records
c.all_records()
# Create a new survey package
c.create_survey_package_instance(survey_package_id="FAKESURVEY-PACKAGE-ID",
record_id="TEST-RECORD",
email_address="obviously@fakeemail.com",
auto_send=True)
For exporting data: The endpoint that extracts data for the study can't be used if the authenticated user has a role within the study.
See: https://data.castoredc.com/api#/export/get_study__study_id__export_data
from castoredc_api import CastorStudy
# Instantiate Study
study = CastorStudy('MYCLIENTID',
'MYCLIENTSECRET',
'MYSTUDYID',
'data.castoredc.com')
# Export your study to pandas dataframes or CSV files
study.export_to_dataframe()
study.export_to_csv()
# Data and structure mapping are automatically done on export, but you can also map these without exporting your data
# Map your study data locally (also maps structure)
study.map_data()
# Map only your study structure locally
study.map_structure()
# After mapping data and/or structure, you can start working with your study
# Get all reports
study.get_all_report_forms()
# Get all data points of a single record
study.get_single_record('000011').get_all_data_points()
Date fields are returned as strings (dd-mm-yyyy)
Datetime fields are returned as strings (dd-mm-yyyy hh-mm)
Numeric fields are all returned as floats.
This can be changed by supplying the argument format_options when intialising the CastorStudy.
Allowed options are date, datetime, datetime_seconds and time.
See https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior for formatting options.
from castoredc_api import CastorStudy
# Instantiate Study with different formatting settings
study = CastorStudy('MYCLIENTID',
'MYCLIENTSECRET',
'MYSTUDYID',
'data.castoredc.com',
format_options={
"date": "%B %e %Y",
"datetime": "%B %e %Y %I:%M %p",
"datetime_seconds": "%B %e %Y %I:%M:%S %p",
"time": "%I:%M %p",
})
Missing data is mostly handled through pandas (NaN).
User-defined missing data is handled through its definitions in Castor.
For numeric and text-like variables, these values are -95, -96, -97, -98 and -99.
For datetime data, missing data values are with the years 2995, 2996, 2997, 2998, and 2999.
Data is validated against the Castor database, meaning that:
The synchronous upload option uploads each row one by one.
When an Error is encountered or the upload finishes successfully, the program outputs the upload log to the output folder and stops.
The asynchronous upload option uploads each row one by one.
This is about 15-30 times faster than synchronous upload.
The program does not stop if uploading a row encounters an error.
When the upload finishes, the program outputs the upload log to the output folder and stops.
Error messages are stored in the output folder for debugging.
from castoredc_api import CastorStudy
from castoredc_api import import_data
# Create a Study with your credentials
study = CastorStudy('MYCLIENTID',
'MYCLIENTSECRET',
'MYSTUDYID',
'data.castoredc.com')
# Import labelled study data
imported_data = import_data(data_source_path="PATH/TO/YOUR/LABELLED/STUDY/DATA",
column_link_path="PATH/TO/YOUR/LINK/FILE",
study=study,
label_data=True,
target="Study")
# Import labelled study data (asynchronous)
imported_data = import_data(data_source_path="PATH/TO/YOUR/LABELLED/STUDY/DATA",
column_link_path="PATH/TO/YOUR/LINK/FILE",
study=study,
label_data=True,
target="Study",
use_async=True)
# Import non-labelled report data
imported_data = import_data(data_source_path="PATH/TO/YOUR/REPORT/DATA",
column_link_path="PATH/TO/YOUR/LINK/FILE",
study=study,
label_data=False,
target="Report",
target_name="Medication")
# Import labelled survey data
imported_data = import_data(data_source_path="PATH/TO/YOUR/LABELLED/SURVEY/DATA",
column_link_path="PATH/TO/YOUR/LINK/FILE",
study=study,
label_data=True,
target="Survey",
target_name="My first survey package",
email="python_wrapper@you-spam.com")
See below and example_files/ for an examples.
The mg/4 weeks and mg/8 weeks under units will be imported to the med_other_unit fields as they do not match any option of the optiongroup, see link files.
patient | medication | startdate | stopdate | dose | units |
---|---|---|---|---|---|
110001 | Azathioprine | 05-12-2019 | 05-12-2020 | 0.05 | g/day |
110002 | Vedolizumab | 17-08-2018 | 17-09-2020 | 300 | mg/4 weeks |
110003 | Ustekinumab | 19-12-2017 | 03-06-2019 | 90 | mg/8 weeks |
110004 | Thioguanine | 25-04-2020 | 27-05-2021 | 15 | mg/day |
110005 | Tofacitinib | 01-03-2020 | 31-12-2999 | 10 | mg/day |
The non-integer variables under units will be imported to the med_other_unit fields as they do not match any optionvalue of the optiongroup, see link files.
patient | medication | startdate | stopdate | dose | units |
---|---|---|---|---|---|
110001 | Azathioprine | 05-12-2019 | 05-12-2020 | 0.05 | 3 |
110002 | Vedolizumab | 17-08-2018 | 17-09-2020 | 300 | mg/4 weeks |
110003 | Ustekinumab | 19-12-2017 | 03-06-2019 | 90 | mg/8 weeks |
110004 | Thioguanine | 25-04-2020 | 27-05-2021 | 15 | 2 |
110005 | Tofacitinib | 01-03-2020 | 31-12-2999 | 10 | 2 |
Link files should be of the format as shown below. The mapping is variable name in the Excel file -> variable name in Castor. If a variable in other is referenced twice in the Castor column, it means that it has a dependency in Castor.
This is a way to import data that has an "other" category, for example a radio question that reads:
In which case selecting other opens a new text box to enter this information. The second variable in the link_file should be this new text box.
This is treated in the following manner:
other | castor |
---|---|
patient | record_id |
medication | med_name |
startdate | med_start |
stopdate | med_stop |
dose | med_dose |
units | med_units |
units | med_other_unit |
Translation files link the optiongroup value or label from the external database to the optiongroups from Castor. Values are translated for all variables specified in the first column of the file.
Two situations can occur when a value is encountered for which no translation is given:
variable | other | castor |
---|---|---|
family disease history | none | None |
family disease history | don't know | Unknown |
family disease history | deaf | Deafness |
family disease history | cardiomyopathy | (Cardio)myopathy |
family disease history | encephalopathy | Encephalopathy |
family disease history | diabetes | Diabetes Mellitus |
family disease history | cardiovascular disease | Hypertension/Cardiovascular disease |
family disease history | thromboembolism | Thrombosis |
family disease history | tumor | Malignancy |
from castoredc_api import CastorStudy
from castoredc_api import import_data
# Create a Study with your credentials
study = CastorStudy('MYCLIENTID',
'MYCLIENTSECRET',
'MYSTUDYID',
'data.castoredc.com')
# Import study data with a translation file
imported_data = import_data(data_source_path="PATH/TO/YOUR/LABELLED/STUDY/DATA",
column_link_path="PATH/TO/YOUR/LINK/FILE",
study=study,
label_data=True,
target="Study",
translation_path="PATH/TO/YOUR/TRANSLATION/FILE")
Merge files link the multiple columns from the external database to a single checkbox field in Castor.
For each column from the external database specified under other_variable the value under other_value is mapped to the castor_value for the castor_variable.
If specifying a merge file, note that castor_value is the new other variable for your link file (see below).
All other_values not defined raise an Error.
Only supports many-to-one matching.
patient | date baseline blood sample | baseline hemoglobin | factor V Leiden | datetime onset stroke | time onset trombectomy | year of birth | patient sex | patient race | famhist_none | famhist_deaf | famhist_cardiomyopathy | famhist_encephalopathy | famhist_diabmell | famhist_cardiovasc | famhist_malignancy | famhist_unknown |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
110001 | 16-03-2021 | 8.3 | 55;16-03-2021 | 16-03-2021;07:30 | 09:25 | 1999 | Female | Asian | No | No | Yes | Yes | Yes | No | No | No |
110002 | 17-03-2021 | 7.2 | 33;17-03-2021 | 17-03-2021;15:30 | 06:33 | 1956 | Female | African/black | No | Yes | Yes | No | No | No | No | No |
110003 | 16-03-2022 | 9.1 | -45;18-03-2022 | 18-03-2022;02:00 | 12:24 | 1945 | Male | Chinese | Yes | No | No | No | No | No | No | No |
110004 | 17-03-2022 | 3.2 | 28;19-03-2022 | 17-03-2022;21:43 | 23:23 | 1933 | Male | Caucasian/white | No | No | No | No | No | Yes | Yes | No |
110005 | 16-03-2023 | 10.3 | 5;20-03-2023 | 16-03-2023;07:22 | 08:14 | 1921 | Female | Hispanic | No | No | No | No | No | No | No | Yes |
other_variable | other_value | castor_variable | castor_value |
---|---|---|---|
famhist_none | Yes | his_family | None |
famhist_deaf | Yes | his_family | Deafness |
famhist_cardiomyopathy | Yes | his_family | (Cardio)myopathy |
famhist_encephalopathy | Yes | his_family | Encephalopathy |
famhist_diabmell | Yes | his_family | Diabetes Mellitus |
famhist_cardiovasc | Yes | his_family | Hypertension/Cardiovascular disease |
famhist_malignancy | Yes | his_family | Malignancy |
famhist_unknown | Yes | his_family | Unknown |
famhist_none | No | his_family | |
famhist_deaf | No | his_family | |
famhist_cardiomyopathy | No | his_family | |
famhist_encephalopathy | No | his_family | |
famhist_diabmell | No | his_family | |
famhist_cardiovasc | No | his_family | |
famhist_malignancy | No | his_family | |
famhist_unknown | No | his_family |
other | castor |
---|---|
patient | record_id |
date baseline blood sample | base_bl_date |
baseline hemoglobin | base_hb |
factor V Leiden | fac_V_leiden |
datetime onset stroke | onset_stroke |
time onset trombectomy | onset_trombectomy |
year of birth | pat_birth_year |
patient sex | pat_sex |
patient race | pat_race |
his_family | his_family |
from castoredc_api import CastorStudy
from castoredc_api import import_data
# Create a Study with your credentials
study = CastorStudy('MYCLIENTID',
'MYCLIENTSECRET',
'MYSTUDYID',
'data.castoredc.com')
# Import study data with a merge file
imported_data = import_data(data_source_path="PATH/TO/YOUR/LABELLED/STUDY/DATA",
column_link_path="PATH/TO/YOUR/LINK/FILE",
study=study,
label_data=True,
target="Study",
merge_path="PATH/TO/YOUR/MERGE/FILE")
Standard date formatting settings are the following. Date(time) and time fields should follow these formats in the Excel sheet to be uploaded.
These can be changed by supplying the argument format_options when calling create upload.
Allowed options are date, datetime, and time. Decimal separator cannot be changed.
See https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior for formatting options.
from castoredc_api import CastorStudy
from castoredc_api import import_data
# Create a Study with your credentials
study = CastorStudy('MYCLIENTID',
'MYCLIENTSECRET',
'MYSTUDYID',
'data.castoredc.com')
# Import labelled study data with changed formats
imported_data = import_data(data_source_path="PATH/TO/YOUR/LABELLED/STUDY/DATA",
column_link_path="PATH/TO/YOUR/LINK/FILE",
study=study,
label_data=True,
target="Study",
format_options={
"date": "%B %d %Y",
"datetime": "%B %d %Y %I:%M %p",
"time": "%I:%M %p",
})
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
Want to contribute to the testing suite? Or test possible changes you want to contribute? Tests can be ran via two methods: on Github and on your local machine.
On Github: when you create a pull request for this project, Pytest automatically runs for the testing suite (see pytest.yml). If you have added a whole new testing module, don't forget to add this to the pytest.yml file. Within the repository, applicable access rights have been set for the client. Use the following fixtures for the respective modules you want to test:
Locally: You can only run tests locally when you have read and write access to the correct Castor Studies. Please send a message to the repository owner to ask for the correct access, with information on why. Access and correct study IDs will then be given to run the tests.
We use SemVer for versioning. For the versions available, see the tags on this repository.
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details