guruucsd / lateralized-components

Submission to OHBM 2016 on functional lateralization using the neurovault dataset.
0 stars 2 forks source link

Duplicate images in the Neurovault #38

Closed atsuch closed 7 years ago

atsuch commented 8 years ago

Updated images from qc.py shows many duplicate images with different image id/meta_data. We need to somehow filter out duplicates so that they don't dominate ICA components. Eg. collections 410, 1886 (and probably many more).

The easiest way may be to use automatically-generated NV meta-data, such as brain_coverage and perch_bad_voxels (and perc_voxels_outside, although this is missing in some images..)...but this might not work, since when I checked # of unique combinations of these three in NV metadata, there were only about 2000 of them, when there are ~9000 unique images.

@bcipolli, can you tackle this problem?

bcipolli commented 8 years ago

:+1: Thanks for doing a bit of work up-front. I will try out the metadata approach (pre-download), and fall back to the full image comparison (post-download).

bcipolli commented 7 years ago

Fixed by me