Attribute "invalid" cannot be NULL in a data export.

vubiostat / redcapAPI

R interface to REDCap (http://www.project-redcap.org/)

21 stars 27 forks source link

Attribute "invalid" cannot be NULL in a data export. #415

Closed tobadia closed 1 month ago

tobadia commented 1 month ago

Hello maintainers,

While writing/running an operational pipeline trying to standardize our approach to data exports and analysis in a clinical trial of ours, I realize that the check for "Invalid" data can't ever fall in the "no invalid data" planned at https://github.com/vubiostat/redcapAPI/blob/82a91c256554739cda237db26cf34faf125ef1f9/R/reviewInvalidRecords.R#L103

When a data.frame containing fully valid data is returned from exportRecordsTyped(), the invalid attribute is not NULL but rather contains nested attributes ("time", "version" and "project"). Therefore it cannot be NULL.

One possible fix would be to check the number of rows of its content passed through as.data.frame().

Hope that helps! Best, Thomas

spgarbet commented 1 month ago

The Invalid attribute is meant to be a data.frame containing a data.frame of all invalid records. Can you try, nrow(Invalid) > 0 and tell me what that does?

tobadia commented 1 month ago

nrow() applied on an Invalid object works properly (it returns the number of rows of the corresponding data.frame)

In my case (not a reprex but easy enough):

nrow(attr(dat_inventory_raw[[RCON_1]], "invalid")) > 0
[1] FALSE

When there are some records failing validation:

nrow(attr(dat_observational_raw[[RCON_2]], "invalid")) > 0
[1] TRUE

However, in the first case, is.null() is FALSE because the data.frame exists:

is.null((attr(dat_inventory_raw[[RCON_1]], "invalid")))
[1] FALSE

I believe the call to is.null() should rather be similar to what you're suggesting: nrow(Invalid) == 0 would ensure that the logic is TRUE when there are no records failing validation.

edit: I'm using 2.9.0 but that shouldn't make a difference

spgarbet commented 1 month ago

There is a helper function reviewInvalidRecords. The help page has full details about this object.

nutterb commented 1 month ago

@tobadia As I recall, the purpose of the is.null check in reviewInvalidRecords is to bring attention to the user in the instance where a data frame is passed that was not the output of exportRecordsTyped. For instance,

Invalid <- reviewInvalidRecords(mtcars)

This will return a NULL value because the invalid attribute does not exist on mtcars.

I think a good way to think of it is this

if is.null(Invalid) it TRUE, then the data frame isn't the immediate result of exportRecordsTyped. No conclusion can be made about invalid records in the data.
if is.null(Invalid) is FALSE AND nrow(Invalid) is 0, then there are no invalid records
if is.null(Invalid) is FALSE AND nrow(Invalid) is positive, there are invalid records.

tobadia commented 1 month ago

AH! Yes, ok, that works. I was under the impression that this check was there and that jointly with the verbose-like flag, it would let the user know if data felt into the scenario of invalid content or not.

Your explanation makes sense. Sorry for the misunderstanding!

(Reviewing "rightfully invalid" data did prompt me to open a separate issue, but that'll be for tomorrow.)

spgarbet commented 1 month ago

Excellent. I think this is answered. If not, feel free to reopen.