sburns / recon-stats

Parse stats outputs from recon-all
4 stars 5 forks source link

Is 2500 too many measures for one redcap instrument? #4

Open KirstieJane opened 10 years ago

KirstieJane commented 10 years ago

This is definitely not a recon-stats problem rather it's a question that was raised by my redcap administrator. His argument is

Having 2500+ fields in the same form is not a good idea. When the number of subjects becomes very large, there is a risk that you might not be able to export all data in one-go because we run out of memory to handle the export. If that happens you will have to download data in batch.

Any thoughts on splitting up the instruments? Too much of a pain? Other reasons not to? I'm definitely happy to help with this issue :)

sburns commented 10 years ago

short answer

Yes, this is a lot of data and its probably too much for a single form.

long answer

Yes, this is a lot of data, and he's correct in that exporting through the web interface will fail with more than X amount of records. It depends on the redcap server setup for how big the export can be until it fails but that's site specific.

I never export this kind of data through the web site though. If I need this data from REDCap, its because I'm going to do something with it in a python notebook/application/script. The amount of data I export/import through the API is a mountain compared to what I do through the web. However, I understand that statement puts me in a very small minority :unamused:

After a project goes past some threshold for the number of fields, the data export page doesn't show individual fields but rather just forms (there's a link to view the fields though). Adding the recon form to your project definitely puts you over this limit. At this point, there's not much difference between having 5 forms or 1 wrt to the data export page.

philosophical aside about the design of redcap projects...

I keep my "in-magnet" measures (like this recon form but also motion from fMRI & DTI, mean FA across regions, in-magnet fMRI task accuracy, etc) in a separate project from out-of-magnet behavioral and demographics, mostly because of this issue. It does put the onus on me to write the join logic when combining multiple projects (tables), but it's easier on my RAs because they never have to see the otherwise gigantic table of the magnet-related data. PyCap ships with built-in pandas support because:

I gave a 15-minute overview of building reproducible & fast second-level analyses with PyCap, pandas & the IPython notebook. Take a look if you're interested, the ideas scale to all kinds of data analyses both in and out of magnet.

sburns commented 10 years ago

Regardless of this, sample data dictionaries should be included in this repo so people can make their own choices about this issue!