Minor labeling issue in one datafile

adswa commented 5 years ago

Up front: I'm not entirely sure I did the correct thing in the anderson_etal subdataset -- please check.

There is a very recent paper which I assume we should cite. They report a minor labeling mistake in Appendix 2 for 75 samples. Since its out there reported, we should probably adjust the file. I checked back with Richard Anderson, and the file is not fixed in his repo. Its impact on the results is tiny, though. If I rerun the analyses with the fix, we see an improvement in the magnitude of one decimal in misclassification rate and the Jaccard index in the respective places. I datalad added a fixed file (UH29_img_Europe_labelled_FIX_MN.mat) to the subdataset, and renamed the corresponding RA equivalent, and adjusted the anderson.py and test_labeled.py scripts to use the new file. However, given that I only see an updated submodule hash in the summary of this PR, I'm not entirely sure that this did what I wanted it to do -- so if not, I'd appreciate guidance in how to update the submodule correctly.

adswa commented 5 years ago

oh, also: I did not compress the new file when I saved it with scipy.io.savemat() - the resulting file is therefore considerably larger than before (235K versus 70K). Please let me know whether I should compress it.

mih commented 5 years ago

You changed the a file in a subdataset that is just referenced. Here is the change https://github.com/richardandersson/EyeMovementDetectorEvaluation/compare/c6d02539712d12d7bea96912521a43cb84e7a7b8...a250fcb05f70cf400532be257ea2a89992c82b54

If you click on it, it will say that it cannot find it. That is because we link the original repo https://github.com/richardandersson/EyeMovementDetectorEvaluation, but it never got this change.

Two choices:

Submit your change to the files to the original authors, and have them update their repo. In this case, all of this PR, except for the submodule SHASUM change is not necessary. This is by far the best option, but could take a while to complete.
Publish you fork of the data repo to another place, and adjust the URL to the submodule in .gitmodules of the remodnav repo, i.e. here https://github.com/psychoinformatics-de/remodnav/blob/5c7b8b5523752efb45dd88abf14b3c59c656be5d/.gitmodules#L6

adswa commented 5 years ago

@mih thx for the assistance. I gave the sub optimal approach nr 2 a try and created a new repo with the fixed data file.

Travis isn't happy but thats unrelated to new data

```python raise VersionConflict(dist, req).with_context(dependent_req) pkg_resources.VersionConflict: (pytest 3.3.0 (/home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages), Requirement.parse('pytest>=3.6')) ```

mih commented 5 years ago

The fixed file has been merged upstream. I think we can close this.

psychoinformatics-de / remodnav

Minor labeling issue in one datafile #4