sibis-platform / ncanda-data-integration

This is the Data Integration, MRI, and Bioinformatics Component of the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA), funded by the NIAAA.
https://www.nitrc.org/projects/ncanda-datacore
BSD 3-Clause "New" or "Revised" License
4 stars 10 forks source link

update_visit_data: intelligently filter out configured responses? #372

Open shippy opened 4 years ago

shippy commented 4 years ago

With the latest influx of variables into Data Entry

This is clocking at about 83 issues. The standard SOP is to ask the sites to go fix it in the Import project, but given the homogeneity of the errors, it might not be a bad idea to implement some filtering in update_visit_data. Implementation sketch:

  1. a config section in sibis_sys_config.yml of the form
    update_visit_data:
      ignored_values:
        lssaga1_youth_dm15d_dm15d_1_dm15d_y:
          - "."
          - "In progress"
  2. a method in update_visit_data that goes through ignored values per field and empties them out with something like data.loc[data[varname] == value, varname] = np.nan