Data quality visualization demo?

mih commented 3 years ago

Maybe it would be straightforward to adopt @jsheunis web-app for this?

jsheunis commented 3 years ago

I'm happy to look into this.

For context:

jsheunis commented 3 years ago

OK, I'm adding some information here to sketch ideas for how data quality could be represented for the studyforrest dataset. Much of this is influenced by my current and limited understanding of the dataset and its composition, i.e. it will probably change.

Current thinking

The represented data quality measures will depend on the data itself. There are multiple modalities and datatypes from which to extract data quality information. Most of this will come from derived (not raw) data. It is useful to look at which data derivations already exist because that would mean that we wouldn't have to run pipelines on the data (or do we specifically want to have a consolidated data quality pipeline for the full dataset?)

As a first shot it makes sense (to me) to start with standard functional and structural MRI quality metrics that most users would be familiar with. These include framewise displacement (functional) and structural-functional registration overlap (for whole brain, cortical surfaces, and ROIs). It also makes sense to first work from the derived data that are already available.

Derived data to look at

Currently, I am aware of the following derived / preprocessed data that fit the above description:

Motion parameters for online motion-corrected 7T audio movie fMRI time series (from https://openneuro.org/datasets/ds000113/versions/1.3.0)
Motion parameters for multiple 3T fMRI time series: visual area localization, audio-visual movie, retinotopic mapping (from https://github.com/psychoinformatics-de/studyforrest-data-aligned)
Lots of freesurfer parcellations as well as gif outputs from a related QA pipeline (from https://github.com/psychoinformatics-de/studyforrest-data-freesurfer)
Visual ROIs in subject space (from https://github.com/psychoinformatics-de/studyforrest-data-visualrois)

Some questions to help guide our approach

how detailed and extensive do we want to go with QA?
- do we want to make it possible to view data on a subject-level, or do we only want to display summary data (e.g. distribution plots of QA measures) and perhaps a few example subject-level plots?
- I'm assuming for now we want some high level QA plots that can be embedded and interacted with (to a degree) in the website. A standalone QA application is also possible, but perhaps overkill?
Are there any other derivatives that would be useful to include? I'm not familiar with QAing eye tracking data, but perhaps this could also be useful/interesting to include? There's also the ICA-denoised derivative data, but including related QA measures would take us deeper into pipeline-specific QA, i.e. moving away from data-level QA.

jsheunis commented 3 years ago

Update

Here's a screencapture of a Plotly graph of the framewise displacement distributions per subject, each over all 8 runs of the 7T audio movie fMRI acquisition.

I haven't put any time yet into making the graph prettier. Any suggestions for improvements very welcome. The file is standalone HTML (of about 4.5MB) with embedded javascript, as exported from Plotly.

https://user-images.githubusercontent.com/10141237/116203957-879cab00-a73c-11eb-87dd-a6183cc7d580.mov

Notes:

Since we're looking at high-level summary visualizations, I pooled all framewise displacement measures from all 8 runs into one distribution plot per subject. Any interest in seeing it per run?
sub-10 has no motion parameters in this dataset, does anyone know why? @mih? I don't think it should stop us from having this visualization in the website, though.

Next steps

1. Framewise displacement for 3T data

Do the same as above for the 3T audio-visual movie data (from here: https://github.com/psychoinformatics-de/studyforrest-data-aligned).

I noticed the motion parameters look suspicious. See an extract from studyforrest-data-aligned/sub-01/in_bold3Tp2/sub-01_task-avmovie_run-1_bold_mcparams.txt below. Typically the first three columns are translations, and the last three are rotations (either in degrees or radians). The values in the first three columns look suspiciously low. This is the same for multiple subjects.

-0.0185102  0.000964457  0.00570813  0.237992  -0.565209  1.38533  
-0.016557  0.000730658  0.00541475  0.225521  -0.546841  1.33916  
-0.018021  0.00118411  0.00541045  0.233148  -0.53043  1.38916  
-0.0168537  0.00115001  0.00524862  0.222841  -0.584751  1.29539  
-0.0180707  0.00130329  0.00560456  0.220491  -0.474867  1.34273

I looked in the code directory of this dataset for some more info, and I found this excerpt from the studyforrest-data-aligned/code/mk_movie_ds.py file:

mc = np.recfromtxt(
            'sub-%.2i/in_%s/sub-%.2i_task-%s_run-%i_bold_mcparams.txt'
            % (subj, label, subj, task, seg),
            names=('mc_xtrans', 'mc_ytrans', 'mc_ztrans', 'mc_xrot',
                   'mc_yrot', 'mc_zrot'))

which seems to suggest that these parameters are in the correct order. Any comments?

2. Freesurfer segmentation overlays

Look at grabbing some of the existing freesurfer QA snapshots (from https://github.com/psychoinformatics-de/studyforrest-data-freesurfer, see an example snapshot below) and creating an informative example of freesurfer outputs. Perhaps also in montage form, or a movie of images.

Screenshot 2021-04-27 at 09 49 14

jsheunis commented 3 years ago

Regarding the suspicious motion parameters, I'm going to assume for now that it was created according to the standard in FSL MCFLIRT (which seems to have been used here), which outputs params in the order trans_x, trans_y, trans_z, rot_x, rot_y, rot_z with translations in mm and rotations in radians.

jsheunis commented 3 years ago

Actually, no. Seems like MCFLIRT puts the rotations first. This would explain the suspicious looking data.

jsheunis commented 3 years ago

Here's a screencast for the 3T framewise displacement distributions. In this case data for sub-07, sub-08, sub-11, sub-12 and sub-13 were all missing.

https://user-images.githubusercontent.com/10141237/116223336-b8d2a680-a74f-11eb-826d-87306476df45.mov

mih commented 3 years ago

Looks great, all of it!

re the mysterious motion params: I think you last hypothesis is the good one. Range is radians and rotation is first.

mih commented 3 years ago

Since we're looking at high-level summary visualizations, I pooled all framewise displacement measures from all 8 runs into one distribution plot per subject. Any interest in seeing it per run?

I have seen multiple analyses that selected individual "good" runs, so it would make sense.

sub-10 has no motion parameters in this dataset, does anyone know why? @mih? I don't think it should stop us from having this visualization in the website, though.

https://www.nature.com/articles/sdata20143/tables/4 Distortion correction did not work.

jsheunis commented 3 years ago

More updates

A 'good' subject, all runs:

https://user-images.githubusercontent.com/10141237/116255277-6191fd80-a772-11eb-8cd6-520eabe09ef3.mov

And a single 'good' run from the same subject:

https://user-images.githubusercontent.com/10141237/116255426-871f0700-a772-11eb-9e6c-a9b93e1efb15.mov

jsheunis commented 3 years ago

Freesurfer derivatives update

Have created this gif from the freesurfer-processed files for subjects 1 through 20. It shows, per subject, snapshots of the white/grey matter, subcortical atlas, and cortical parcellations. Would this be useful to include in the website?

out

jsheunis commented 3 years ago

Data quality update done to the explore page with #44 and #46

psychoinformatics-de / studyforrest-data