Closed JANHMS closed 1 year ago
Interesting -- to be honest, I'm also not 100% percent sure what the overwrite
parameter really does. I think if you set it to overwrite
and then you only write back a subset of data for a record, it will delete the existing data in that record.
But like you pointed out, even with "normal" mode it doesn't seem to respect existing data in records.
Can I ask a little bit more about your use case?
If there is any existing data in a record, do you want to avoid writing to it altogether?
Or do you want to go field by field, record by record, checking for the existence of data, and writing only to those fields for those records that don't have existing data?
@pwildenhain yes it is as you said. I want to go record by record and only write data to the fields, where there is no data added yet. Since I want to write data to REDCap but it might have been the case that already data has been added manually. Then I do not want to overwrite this data.
Got it.
To me this seems outside of the scope of what PyCap is for, which is a simple wrapper around the REDCap API
However, I do think this is a really interesting use-case, and interesting problem.
I think you'll need to go a custom route with this, and below is some (not fully tested ⚠️ ) code on how to solve this problem
import os
import pandas as pd
from redcap import Project
my_project = Project("https://redcap.somewhere.edu/api/", os.getenv("MY_PROJECT_TOKEN"))
current_project_data = my_project.export_records(format="df)
# you might get this from somewhere else, but it's the data you're looking to import into your project
new_project_data = pd.read_csv("new_data_for_my_project.csv")
to_import = pd.DataFrame()
for idx, row in new_project_data.iterrows():
try:
current_record_data = current_project_data.loc[[idx]].copy()
except KeyError:
# This is a completely new record
to_import = pd.concat([to_import,new_project_data.loc[[idx]])
# no need to check column by column
continue
row_import = pd.DataFrame()
for col in new_project_data.columns:
current_value = current_record_data[col].values[0]
# Note: this won't work well with checkbox fields
if current_value:
new_data = new_project_data.loc[[idx]][[col]]
# Don't overwrite any pre-existing values, make sure we're using the current value
import_col = new_data.replace(to_replace=new_data.values[0][0], value=current_value)
else:
import_col = new_data
row_import = pd.concat([row_import, import_col], axis="columns")
to_import = pd.concat([to_import, row_import])
num_updated = my_project.import_records(to_import, import_format="df")
Couple of notes:
pandas
is not my strong suit, so the code above is almost certainly not optimized 🙈 Try this out, let me know what you think 😉 This could honestly be worth asking the REDCap team to give a new option for overwrite
in import_records
that's like conservative
or something like that, where it doesn't overwrite any existing data
EDIT: I realized some of my pandas
code was wrong, but I think I mostly fixed it. You may still need to tinker around with it a little bit to get it to work for you. But I still think that the overall strategy is solid
Let me know if this was helpful for you or not, curious to hear about the solution you landed on
Summary
First of all thanks for building up this cool tool.
I was wondering, if it would be possible to add a possibility to the
import_records
method to actually not overwrite values, which are already within the REDCap instance. E.g. a person filled out the data already manually. Then I would not want to overwrite this data. With the current optionnormal
it is overwriting this fields within REDCap.