vubiostat / redcapAPI

R interface to REDCap (http://www.project-redcap.org/)
15 stars 26 forks source link

Bioportal Field Not Exporting #364

Closed obregos closed 1 month ago

obregos commented 1 month ago

I have a field named aedecod in my data that is a Bioportal Field. When exporting the data without this field it works correctly. However, including the form that this field is in or exporting this field directly results in the following error:

exportRecordsTyped(eds, fields = c('aedecod'))
Error in redcapError(response, error_handling = error_handling) : 
  400: ERROR: The following values in the parameter "fields" are not valid: 'NA'`
obregos commented 1 month ago

I do not get this error when exporting the bioportal field from our redcapAPI test database and the export runs correctly.

exportRecordsTyped(rcon, fields = c('bioportal_test'))

Perhaps something in the original project is configured incorrectly?

nutterb commented 1 month ago

I would want to trace the fields object through to the API call. It looks like aedecod is being turned to NA somewhere along the way, and I'm not sure what would cause that. Possibly something in .exportRecordsTyped_fieldsArray?

Could you try the following straight call and report what is returned? If this is avoids the error then it would seem we are losing the field name somewhere before the API call.

makeApiCall(rcon, 
            body = c(list(content = "record", 
                          format = "csv", 
                          returnFormat = "csv", 
                          type = "flat"), 
                     vectorToApiBodyList("aedecod", "fields")))
obregos commented 1 month ago

Running the makeAPICall above I get the following:

Response [https://redcap.vumc.org/api/]
  Date: 2024-05-07 14:43
  Status: 200
  Content-Type: text/csv; charset=utf-8
  Size: 4.42 kB
aedecod

...
nutterb commented 1 month ago

Could you try the following using the Issue 364 branch. I only added some print statements to tell me what the value of fields is in a few places. (Hoping to isolate where exactly the field name is getting lost). I'll be interested in seeing what gets printed to the console.

exportRecordsTyped(eds, fields = c('aedecod'))
obregos commented 1 month ago

Thanks. Here is what I got back:

exportRecordsTyped(eds, fields = c('aedecod'))
[1] "fields at start-------------------------"
[1] "aedecod"
[1] "fields after system fields-----------------------"
[1] "aedecod"
[1] "FieldFormMap initialization --------------"
[1] original_field_name choice_value        export_field_name   index               form_name          
<0 rows> (or 0-length row.names)
[1] "fields_to_request-----------------"
[1] "aedecod"
[1] "aedecod"
[1] "fields after exportRecordsTyped_fieldsArray--------"
[1] "record_id" "aedecod"  
Error in redcapError(response, error_handling = error_handling) : 
  400: ERROR: The following values in the parameter "fields" are not valid: 'NA'
Called from: redcapError(response, error_handling = error_handling)
Browse[1]> c
> 
nutterb commented 1 month ago

That is not at all what I expected to see. and kind of blows up my theory.

I've pushed a new commit to that same branch. Could you try again. This time it should print the body object before it sends the API call.

obregos commented 1 month ago

Thank you. Here is the result:

exportRecordsTyped(eds, fields = c('aedecod'))
$content
[1] "record"

$format
[1] "csv"

$returnFormat
[1] "csv"

$type
[1] "flat"

$exportSurveyFields
[1] "true"

$exportDataAccessGroups
[1] "true"

$csvDelimiter
[1] ","

$`fields[1]`
[1] "record_id"

$`fields[2]`
[1] "aedecod"

Error in redcapError(response, error_handling = error_handling) : 
  400: ERROR: The following values in the parameter "fields" are not valid: 'NA'
Called from: redcapError(response, error_handling = error_handling)
Browse[1]> c
nutterb commented 1 month ago

Okay. I'm stumped.

It's reaching the API call with the proper structure. So I have no idea where the NA in fields is coming from.

obregos commented 1 month ago

Debugging with Shawn. Found the field annotation that breaks it:

@p1000lang{"English":"<p style=\"padding-left: 40px;\">Have you had these thoughts and had some intention of acting on them?</p> <p style=\"padding-left: 40px;\"><span style=\"font-weight: normal;\">As opposed to \"I have the thoughts but I definitely will not do anything about them.\"</span></p>","Español":"<p style=\"padding-left: 40px;\">¿Ha tenido estos pensamientos y ha tenido alguna intención de actuar en consecuencia?</p> <p style=\"padding-left: 40px;\"><span style=\"font-weight: normal;\">En lugar de \"Lo pienso, pero definitivamente no haré nada al respecto.\"</span></p>"} @p1000answers{"English":{"0":"No","1":"Yes"},"Español":{"0":"No","1":"Sí"}}
spgarbet commented 1 month ago

The field annotation is somehow starting a new row of data when read as a csv in R. Thus corrupted lines are getting created in the metadata.

spgarbet commented 1 month ago

The immediate work around is merged. The ticket is now open to see if the deeper issues with csv exports of metadata can be resolved.

spgarbet commented 1 month ago

broken.csv

This is the smallest I can shave it down to.

> x<-read.csv("broken.csv"); which(grepl(" ", x$field_name))
[1] 6
> x[6,]
  field_name
6    words12
                                                                        form_name
6  words13\\","Español":"words14, words15, words16, words17,word18.\\words19\\."}
  section_header field_type field_label select_choices_or_calculations
6             NA                     NA                             NA
  field_note text_validation_type_or_show_slider_number text_validation_min
6         NA                                         NA                  NA
  text_validation_max identifier branching_logic required_field
6                  NA         NA              NA             NA
  custom_alignment question_number matrix_group_name matrix_ranking
6               NA              NA                NA             NA
  field_annotation
6   

The problem is read.csv can't understand the escaping. I tried this in Excel and it can.

spgarbet commented 1 month ago

Simplest version yet:

field,description
a,
b,
c,
d,
e,"words1\"words2, words3\""
spgarbet commented 1 month ago

csvlint gives the following:

garbetsp@biostat1427:~/Projects/cran/redcapAPI$ ~/go/bin/csvlint ~/broken.csv 
Record #5 has error: extraneous or missing " in quoted-field

unable to parse any further
spgarbet commented 1 month ago

Final answer: RFC4180. The field annotation for this project is malformed. Excel has some exceptions that allow it to be read. However, the csv being provided is improper. We could do a gsub("\\"", "\"\"", response) on the response, but this would penalize every other user of the library with poorer performance. I think the recommended course of action is that ACTIV6 should edit their field_annotations to be compliant with the standard.

spgarbet commented 1 month ago

In the 'I can't stop picking at it' category, I found a StackOverflow. Apparently data.table solved this problem.

> library(data.table)
> fread("~/Projects/sandbox/broken.csv", fill=TRUE)
   field_name,form_name,section_header,field_type,field_label,select_choices_or_calculations,field_note,text_validation_type_or_show_slider_number,text_validation_min,text_validation_max,identifier,branching_logic,required_field,custom_alignment,question_number,matrix_group_name,matrix_ranking,field_annotation
                                                                                                                                                                                                                                                                                                                 <char>
1:                                                                                                                                                                                                                                                                                                                   a,
2:                                                                                                                                                                                                                                                                                                                   b,
3:                                                                                                                                                                                                                                                                                                                   c,
4:                                                                                                                                                                                                                                                                                                                   d,
5:                                                                                                                                             duke_instruct,demographics,,descriptive,,,,,,,,,,,,,,"words10\\"words11, words12, words13\\""",""words9"":""words14, words15, words16, words17,word18.\\"words19\\".""}"

Non trivial custom parser in C: https://github.com/Rdatatable/data.table/blob/master/src/fread.c

jubilee2 commented 3 weeks ago

@spgarbet , should we consider transitioning to data.table for its robust CSV parsing capabilities? This could serve as a long-term solution.