lucapton / ICLabel-Dataset

Dataset for training EEG IC classifiers.
13 stars 1 forks source link

Can't use the dataset :( #1

Closed diveu closed 3 years ago

diveu commented 4 years ago

Hi! I'm working on my master thesis and trying to use your ICA labeled data to train my model to detect artifacts. I keep getting 500 Error: Downloading individual ICLabel training set CL label files... Downloading label file 0 of 2... HTTP Error: 500 https://labeling.ucsd.edu/download/ICLabels_experts.pkl Downloading label file 1 of 2... Done. Loading full dataset...

and cant open features dataset: `--------------------------------------------------------------------------- IOError Traceback (most recent call last)

in () ----> 1 icl.load_data() /Users/ivkitov/univer/diploma/diploma_code/data/ICLabel-Dataset/icldata.py in load_data(self) 954 self.check_for_download('train_features') 955 # topo maps, old psd, dipole, and handcrafted --> 956 with h5py.File(join(self.datapath, 'features', 'features_0D1D2D.mat'), 'r') as f: 957 print('Loading 0D1D2D features...') 958 features.append(np.asarray(f['features']).T) /Users/ivkitov/anaconda3/envs/python2Env/lib/python2.7/site-packages/h5py/_hl/files.pyc in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds) 406 fid = make_fid(name, mode, userblock_size, 407 fapl, fcpl=make_fcpl(track_order=track_order), --> 408 swmr=swmr) 409 410 if isinstance(libver, tuple): /Users/ivkitov/anaconda3/envs/python2Env/lib/python2.7/site-packages/h5py/_hl/files.pyc in make_fid(name, mode, userblock_size, fapl, fcpl, swmr) 171 if swmr and swmr_support: 172 flags |= h5f.ACC_SWMR_READ --> 173 fid = h5f.open(name, flags, fapl=fapl) 174 elif mode == 'r+': 175 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl) h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/_objects.pyx in h5py._objects.with_phil.wrapper() h5py/h5f.pyx in h5py.h5f.open() IOError: Unable to open file (truncated file: eof = 6178545152, sblock->base_addr = 512, stored_eof = 14419006177)` Can you help me please?
ledovsky commented 4 years ago

I have the same =(

lucapton commented 3 years ago

I'm sorry for having missed this for so long. The problem is that the automatic file download fails and then it can't find the file. I'll see if I can fix the problem. In the meantime, you can download the files manually.

The block of code containing file urls:

        self.base_url_download = 'https://labeling.ucsd.edu/download/'
        self.feature_train_zip_url = self.base_url_download + 'features.zip'
        self.feature_train_urls = [
            self.base_url_download + 'features_0D1D2D.mat',
            self.base_url_download + 'features_PSD_med_var_kurt.mat',
            self.base_url_download + 'features_AutoCorr.mat',
            self.base_url_download + 'features_ICAChanlocs.mat',
            self.base_url_download + 'features_MI.mat',
        ]
        self.label_train_urls = [
            self.base_url_download + 'ICLabels_experts.pkl',
            self.base_url_download + 'ICLabels_onlyluca.pkl',
        ]
        self.feature_test_url = self.base_url_download + 'features_testset_full.mat'
        self.label_test_url = self.base_url_download + 'ICLabels_test.pkl'
        self.db_url = self.base_url_download + 'anonymized_database.sqlite'
        self.cls_url = self.base_url_download + 'other_classifiers.mat'
lucapton commented 3 years ago

This is actually 2 different problems.

  1. ICLabels_experts.pkl should be ICLabels_expert.pkl
  2. Downloading features_0D1D2D.mat stops before the file is complete.
lucapton commented 3 years ago

Item (1) has been fixed, but I'm still having trouble with (2). My attempt at fixing it was to recreate the zip archive as a multi-disk zip where each file is no more than 1 GB. Unfortunately python's zipfile library does not support multi-disk zips. I'm looking into using libarchive instead but I've run out of time for now.

lucapton commented 3 years ago

I believe this is fixed. I confirmed the download now works but I can't actually load the dataset on my personal computer due to lack of RAM. If there are any problems, feel free to reopen this issue.

datalw commented 3 years ago

@lucapton Thank you for sharing the valuable resource! I tried to download the data, but had a similar error:

Loading full dataset...
Traceback (most recent call last):
  File "e:\GoogleDriveBB\Program\ICLabel-Train\loading_data.py", line 4, in <module>
    icldata = icl.load_semi_supervised()
  File "e:\GoogleDriveBB\Program\ICLabel-Train\icldata.py", line 1231, in load_semi_supervised
    icl = self.load_data()
  File "e:\GoogleDriveBB\Program\ICLabel-Train\icldata.py", line 969, in load_data
    with h5py.File(join(self.datapath, 'features', 'features_0D1D2D.mat'), 'r') as f:
  File "C:\ProgramData\Anaconda3\lib\site-packages\h5py\_hl\files.py", line 406, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "C:\ProgramData\Anaconda3\lib\site-packages\h5py\_hl\files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'features\features_0D1D2D.mat', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

I also tried to download manually from the link https://labeling.ucsd.edu/download, but it seems that this page does not exist... I tried in different browsers and got the same page as below: grafik

lucapton commented 3 years ago

@datalw You need to call download_trainset_features in order to download the files that it says are missing (I've updated the readme to show show that). If you want to download them manually, you have to use the link for the exact files: https://labeling.ucsd.edu/download/features_0D1D2D.mat https://labeling.ucsd.edu/download/features_PSD_med_var_kurt.mat https://labeling.ucsd.edu/download/features_AutoCorr.mat https://labeling.ucsd.edu/download/features_ICAChanlocs.mat https://labeling.ucsd.edu/download/features_MI.mat

What I find curious though is that if you don't have the files you should have hit an assertion error at line 967 of iclabel.py. Any idea why it didn't?

datalw commented 3 years ago

@datalw You need to call download_trainset_features in order to download the files that it says are missing (I've updated the readme to show show that). If you want to download them manually, you have to use the link for the exact files: https://labeling.ucsd.edu/download/features_0D1D2D.mat https://labeling.ucsd.edu/download/features_PSD_med_var_kurt.mat https://labeling.ucsd.edu/download/features_AutoCorr.mat https://labeling.ucsd.edu/download/features_ICAChanlocs.mat https://labeling.ucsd.edu/download/features_MI.mat

What I find curious though is that if you don't have the files you should have hit an assertion error at line 967 of iclabel.py. Any idea why it didn't?

@lucapton Thanks a lot! Right now both ways work - downloading via the three-line codes and with the links above : ) I have checked why it did not work, here is what I have found: grafik

As you see, in the line 1708 I printed data_type, it is a string instead of a list as shown in the terminal. That's why val takes only one letter t, which cannot find its compatible case in the if-elif cases ; )

lucapton commented 3 years ago

Thanks for finding that! Glad it works for you.

lucapton commented 3 years ago

So I looked into it and I should say that the code works as-is in Python 2.7 but not in 3.x which it appears you're using. I just want to provide a warning that I wrote this all in 2.7 and can't guarantee anything for python 3. I realize that was a poor choice on my part but it's just a fact not. That said, if you run into anymore issue, let me know and I'll try to fix them. I'm about to push a change that makes this specific piece of the code work for both.

datalw commented 3 years ago

So I looked into it and I should say that the code works as-is in Python 2.7 but not in 3.x which it appears you're using. I just want to provide a warning that I wrote this all in 2.7 and can't guarantee anything for python 3. I realize that was a poor choice on my part but it's just a fact not. That said, if you run into anymore issue, let me know and I'll try to fix them. I'm about to push a change that makes this specific piece of the code work for both.

Thanks for the note! There was no big incompatibility, as I run the dataset codes. What I had to change were only a few lines of print.