vubiostat / redcapAPI

R interface to REDCap (http://www.project-redcap.org/)
15 stars 26 forks source link

Unexpected behavior with updated package #341

Closed obregos closed 2 months ago

obregos commented 2 months ago

redcapAPI user question:

d <- exportBulkRecords(list(D=edc_conn),
                        forms=list(D='demographics'),
                        post = function(Records, rcon) {
                            Records |> mChoiceCast(rcon) |>
                                guessDate(rcon, threshold=0.6)})

Running the above code:

redcapAPI version 2.8.3 gives me a data frame with 12518 rows. However, redcapAPI version 2.8.5 gives me a data frame with 257959 rows.

The demographics form only has data in the redcap_event_name 'screening' and in the 2.8.3 version only this value is present in redcap_event_name.

The number of rows seem to go up in 2.8.5 because it includes all of the other project redcap_event_names other than screening.

spgarbet commented 2 months ago

This shows current differences: https://github.com/vubiostat/redcapAPI/compare/v2.8.3...main

There were some changes to filtering empty rows. They don't seem like they would have changed this reading over them.

My recommendations are we work towards a reproducible case with a small amount of data. This should become a functional test if possible.

Side note: exportBulkRecords isn't really needed if you're just doing a single project. It was intended for exporting everything in a single pass from multiple projects.

spgarbet commented 2 months ago

Oddity: If I do, exportRecordsTyped(rcon, forms="slider_fields") or `exportRecordsTyped(rcon, forms="numbers") I get the right columns, but there is always a completely NA row for the first row which I thought was filtered out. This behavior goes back at least to 2.7.0

spgarbet commented 2 months ago

Can you on the project narrow down 4 or 5 record_ids that you expect to be returned on demographics? Then use a call like:

exportRecordsTyped(rcon, forms='demographics', records=YOUR_SMALL_GROUP_HERE)

Then use git bisect to narrow down the exact change that introduced this behavior. https://git-scm.com/docs/git-bisect

Between each git operation, reinstall the package and use your test case. Version 2.8.3 had a lot of changes and if installed from main at some point it's not telling which version was actually in use. I would go back to v2.8.0 as the reference.

spgarbet commented 2 months ago

@obregos reports that bisect says https://github.com/vubiostat/redcapAPI/commit/a40eba085df41335175e0c5fd6b8d2ad3d2f94f0 is the bad commit.

spgarbet commented 2 months ago

This broke with #293

obregos commented 2 months ago

The 'demographics' form is only displayed on the redcap_event_name 'screening_arm_1'. There is a race_eth checkbox field on the 'demographics' form. The race_eth field is one of those required checkboxes that exports as 0 even if the form was never touched.

Previously, when exporting the demographics form, the only redcap_event_name exported would be 'screening_arm_1' as this is the only even this form is on. Now, all of the projects redcap_event_names are exported with the demographics form. All of the data is NA for these other events except for the race_eth fields. For every redcap_event_name the race_eth fields (race_eth_1, raceeth2 etc.) are given a value of 'Unchecked' even on events where this field is not present.

spgarbet commented 2 months ago

The problem is checkboxes are always defined to a value. This circumvents the sparse block filtering of empty rows by making even unrelated rows have a value.

spgarbet commented 2 months ago

The rcon$mapping() on the project in question has the information required. However, for our test project we don't have anything defined.

@nutterb Would it be a safe assumption to use the mapping information if it is defined and ignore otherwise?

spgarbet commented 2 months ago

Would the following be okay? On this line: https://github.com/vubiostat/redcapAPI/blob/f0f7ba19b6b3d0c00f96514ca10c90cc0522167b/R/exportRecordsTyped.R#L222

add this:

if(!is.null(forms) && nrow(rcon$mapping()) > 0)
{
  map <- rcon$mapping()
  form_events <- map$unique_event_name[map$form %in% forms]
  Raw <- Raw[Raw$redcap_event_name %in% form_events,]
}
nutterb commented 2 months ago

The rcon$mapping() on the project in question has the information required. However, for our test project we don't have anything defined.

@nutterb Would it be a safe assumption to use the mapping information if it is defined and ignore otherwise?

I would think so. Any longitudinal project should have mappings, and it would be reasonable to follow those.

Classic projects won't have mappings, and all forms would apply to every record.

spgarbet commented 2 months ago

I see the pull request was accepted.