Inconsistent Behavior of exportRecordsTyped Function When Fields Are Specified

jubilee2 commented 7 months ago

Description

The exportRecordsTyped function in the redcapAPI package is designed to export data from REDCap databases. However, there seems to be an inconsistency when using this function with and without specifying fields.

Expected Behavior

When the fields parameter is provided, the exportRecordsTyped function should return the same number of rows as when no fields are specified.

Version

2.8.4(CRAN)

spgarbet commented 7 months ago

This is expected behavior when filtering empty rows. There is no data beyond the id column so they are all filtered. What does the call return when using filter_empty_rows=FALSE ?

jubilee2 commented 7 months ago

@spgarbet it's not empty rows. actually i export 2 columns, and the second column has data

spgarbet commented 7 months ago

Confirmed in debugging that it excludes system fields and secondary_ids and is working as designed.

jubilee2 commented 7 months ago

thanks @spgarbet

spgarbet commented 7 months ago

@nutterb at one point we had code that skipped the filter if it was just system fields. At least I have a memory of something like that. Should it come back? Did it get lost in the shuffle or was a problem for some other reason I've fortgotten.

obregos commented 7 months ago

In one of my test projects I can specify the record_id field and it returns the number of rows without adding filter_empty_rows = FALSE. The number of rows is two higher without specifying record id because the project has repeating instruments and so two extra rows are for one record with a different repeat instrument. Is this expected behavior? This project does not specify secondary ids.

specify fields

spgarbet commented 7 months ago

Reopening till we agree the behavior is correct. I think the principal of least surprise should apply.

spgarbet commented 7 months ago

Q: If nrow for number of records worked for @obregos why didn't it for @jubilee2 ? Was it the secondary id?

On @obregos question above, the number of rows difference is expected as one is a raw export and it contains the same record id on multiple rows. The second call looks like it's passing it through unique to tighten it up. I think once we identify these behaviors that are surprising then it should at a minimum be added to the documentation.

obregos commented 6 months ago

The View project exports a redcap_survey_identifer field along with record id

> exportRecordsTyped(view_conn, fields = c('record_id'))
> record_id redcap_survey_identifier
1         1                     <NA>
2         2                     <NA>
3         3                     <NA>
4         4                     <NA>

So in filter empty rows https://github.com/vubiostat/redcapAPI/blob/main/R/filterEmptyRow.R#L52 the length of other_fields becomes 1 because it is counting redcap_survey_identifier

All of the values for redcap_survey_identifier are NA and so because of that has_any_value becomes FALSE https://github.com/vubiostat/redcapAPI/blob/main/R/filterEmptyRow.R#L56 and the data is filtered to 0 rows

Now I’m questioning why redcap_survey_identifer is exported when we are only asking for record_id. That should not be the case.

filterEmptyRows did it's job correctly.

obregos commented 6 months ago

Adding redcap_survey_identifer to the system fields list seems to work in returning the correct number of rows. I will need to run tests to make sure it doesn't affect anything else.

REDCAP_SYSTEM_FIELDS <- c("redcap_event_name", 
                          "redcap_data_access_group", 
                          "redcap_repeat_instrument", 
                          "redcap_repeat_instance",
                          "redcap_survey_identifier")

nutterb commented 6 months ago

I'm skeptical of this. The API has an exportSurveyFields parameter that controls if survey fields are included in the export.

I would have thought the desired result would have been given with exportRecordsTyped(view_conn, fields = c('record_id'), survey = FALSE)

obregos commented 6 months ago

@nutterb, you are correct, the survey = FALSE works.

@jubilee2, I think the survey = FALSE is your solution (instead of the filter_empty_rows=FALSE). Does this work for you?

jubilee2 commented 6 months ago

@obregos , I’ve conducted tests and confirmed that both filter_empty_rows=FALSE and survey=FALSE work as intended.

However, I agree that redcap_survey_identifier should be included in the system fields list. This field does not appear in the DD, and it cannot be edited in the redcap UI, which indicates that it behaves more like system data. Including it in the system fields would align with the principle of least surprise and provide a more intuitive experience for end users.

spgarbet commented 6 months ago

@nutterb What is the consequence of including it in system fields?

nutterb commented 6 months ago

I'm not sure. Seeing as it passed all of the tests, potentially nothing.

I think the one thing that needs to be tested is what happens if you use something like

exportRecordsTyped(rcon, fields = c('record_id', 'redcap_survey_identifier'), survey = FALSE)

But it is entirely possible my skepticism is just paranoia.

spgarbet commented 6 months ago

It's a conflicting request. The user asks for the field on the one hand and turns off exporting it with another. Right now the survey=FALSE is taking precedence. This behavior is as defined by the raw API. Given this library is a rich client we're not locked into that choice. We could potentially flag a warning, or change the flag based on the fields requested. Or just ignore it as a weird edge case and leave it as the raw API provides it.

@jubilee2 @obregos What would be the least surprising to you if you had made this (conflicting) request?

> exportRecordsTyped(rcon, fields = c('record_id', 'redcap_survey_identifier'), survey = FALSE)
   record_id
1          1
2          2
3          3
4          4
5          5
6          6
7          7
8          8
9          9
10        10
11        11
12        12
13        13
14        14
15        15
16        16
17        17
18        18
19        19
20        20

obregos commented 6 months ago

Personally, if I went to the trouble of adding the survey = FALSE flag in my call, I would expect the redcap_survey_identifier field to not be included in the export. For example, if I have a list of fields defined I may have forgotten that 'redcap_survey_identifier' in my list.

jubilee2 commented 6 months ago

I’ve been utilizing the original API for many years and am quite comfortable with what the REDCap API offers. I understand that for longitudinal data, it often returns many empty rows, which is what I would expect.

In this instance, the REDCap database does not have repeated events or forms, so setting filter_empty_rows=false will be my solution and makes more sense to me.

Additionally, we need to consider situations where “redcap_survey_identifier” has data for some records but not for others. I’m not sure if “exportRecordsTyped(rcon, fields = c(‘record_id’), survey = TRUE)” would work in such cases.

For now, I’ll opt for ‘filter_empty_rows’ for my situation.

spgarbet commented 6 months ago

Alright, we respect the precedence of the raw API.

vubiostat / redcapAPI