mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.66k stars 1.31k forks source link

raw.crop failure with multi-file CTF #6482

Open nwilming opened 5 years ago

nwilming commented 5 years ago

I'm unsure about the exact meaning of raw.first_samp, but I think there might be an error when cropping multi-file CTF recordings.

I'm working with ~1h long CTF recordings that I crop into blocks to preprocess. Unfortunately the CTF recordings is big (>10GB), so hard to share. The following demonstrates the problem:

>>> import mne
>>> raw = mne.io.read_raw_ctf(filename)
ds directory : /home/gortega/megdata/S1-5_Attractor_20161022_01.ds
    res4 data read.
    hc data read.
    Separate EEG position data file read.
    Quaternion matching (desired vs. transformed):
      -1.14   80.92    0.00 mm <->   -1.14   80.92    0.00 mm (orig :  -53.00   56.58 -262.06 mm) diff =    0.000 mm
       1.14  -80.92    0.00 mm <->    1.14  -80.92    0.00 mm (orig :   48.86  -69.11 -267.17 mm) diff =    0.000 mm
     100.30    0.00    0.00 mm <->  100.30    0.00    0.00 mm (orig :   75.61   54.06 -244.96 mm) diff =    0.000 mm
    Coordinate transformations established.
    Polhemus data for 3 HPI coils added
    Device coordinate locations for 3 HPI coils added
    64 EEG electrode locations assigned to channel info.
    64 EEG locations added to Polhemus data.
    Measurement info composed.
Finding samples for /home/gortega/megdata/S1-5_Attractor_20161022_01.ds/S1-5_Attractor_20161022_01.meg4: 
    System clock channel is available, checking which samples are valid.
    1104 x 1200 = 1324800 samples from 405 chs
Finding samples for /home/gortega/megdata/S1-5_Attractor_20161022_01.ds/S1-5_Attractor_20161022_01.1_meg4: 
    System clock channel is available, checking which samples are valid.
    1104 x 1200 = 1324800 samples from 405 chs
Finding samples for /home/gortega/megdata/S1-5_Attractor_20161022_01.ds/S1-5_Attractor_20161022_01.2_meg4: 
    System clock channel is available, checking which samples are valid.
    1104 x 1200 = 1324800 samples from 405 chs
Finding samples for /home/gortega/megdata/S1-5_Attractor_20161022_01.ds/S1-5_Attractor_20161022_01.3_meg4: 
    System clock channel is available, checking which samples are valid.
    1104 x 1200 = 1324800 samples from 405 chs
Finding samples for /home/gortega/megdata/S1-5_Attractor_20161022_01.ds/S1-5_Attractor_20161022_01.4_meg4: 
    System clock channel is available, checking which samples are valid.
    726 x 1200 = 871677 samples from 405 chs
    723 samples omitted at the end
Current compensation grade : 0

There are 5 meg4 files for this dataset and croping beyond the first meg4 file doesn't work:

# Crop within first meg4 file:
raw.copy().crop((100+1)/1200, (100+10)/1200).first_samp
101 # This is OK!
# Crop in 2nd meg4 file:
>>> raw.copy().crop((1324800+1)/1200, (1324800+10)/1200).first_samp
1 # I believe this should be 1324800+1

I think the source of the problem is that raw._first_samps and raw._last_samps are not set correctly:

>>> raw._first_samps
array([0, 0, 0, 0, 0])
>>> raw._last_samps
array([1324799, 1324799, 1324799, 1324799,  871676])

Fixing them like so solves the problem:

>>> raw._first_samps = np.cumsum(raw._raw_lengths) - raw._raw_lengths[0]
>>> raw._last_samps = np.cumsum(raw._last_samps)
>>>  raw.copy().crop((1324800+1)/1200, (1324800+10)/1200).first_samp
1324801 # OK!

But this leads to problems for me down the line.

agramfort commented 5 years ago

can you modify this line:

https://github.com/mne-tools/mne-python/blob/dbda5849082e686770c02e12d97f2ef6f048937f/mne/io/ctf/ctf.py#L144

with your suggested fix. Tell us if it works and you'll let you send PR or if you prefer we'll fix it ourselves.

thanks

massich commented 5 years ago

I'm trying to create a multi-file version of this phantom file:

    fname = op.join(
        brainstorm.bst_phantom_ctf.data_path(download=False),
        'phantom_20uA_20150603_03.ds'
    )

I thought I would be able to do so with fieldtrip or brainstorm, they both read CTF files but I don't think they can write such a split file.

massich commented 5 years ago

@jasmainak suggested using newDS. Does anyone know how to install it or where is the repo and the doc?

cc:@hyperbolicTom

hyperbolicTom commented 5 years ago

The CTF tools are a mix of old 32-bit and 64-bit libraries, so they are annoying to install. They run under CentOS 6, although by installing a bunch of libraries you can get them to work on Ubuntu. I have Singularity containers for both. Here is a Singularity container with a CentOS 6 environment (which runs on any recent Linux distro). You'll need Singularity version 3.

https://kurage.nimh.nih.gov/tomh/ctf.sif

singularity shell ctf.sif

newDs -marker stim -time -1 1 old.ds new.ds

for example will create an epoched dataset with one trial of 2 sec around each stim mark. preTrig will be 1 sec.

This will allow you to cut a multi-file dataset (one with "spill" files) into one smaller dataset. See also newDs -help

On Tue, Aug 6, 2019 at 3:23 AM Joan Massich notifications@github.com wrote:

@jasmainak https://github.com/jasmainak suggested using newDS. Does anyone know how to install it or where is the repo and the doc?

cc:@hyperbolicTom https://github.com/hyperbolicTom

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/6482?email_source=notifications&email_token=ABMKERXWMPSO5FS6XHF5J33QDERGVA5CNFSM4H2ZFGBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3UFIXY#issuecomment-518542431, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMKERW6NJTJVYSQPNG4FSLQDERGVANCNFSM4H2ZFGBA .

-- Conjunction is left-adjoint to implication.

hyperbolicTom commented 5 years ago

Oh, if you are trying to write a multi-file CTF file that's more of an issue. You can't have a trial larger than 2 GB, for one thing, so it has to be in epoched format. The format is simple enough, the .meg4 files are just flat with an 8 byte header.

Here's code that reads them ... It's part of the SAM Suite, available here in more completeness: https://megcore.nih.gov/index.php/Installation_Instructions

include

include

include

include

include

include

include

include

include

include

FILE OpenDsFile( HeaderInfo Header, char ext ) { int i; static char buf; FILE *f;

i = strlen(Header->DsPath) + strlen(Header->SetName) * 2 + strlen(ext)

static FILE OpenMeg4File( HeaderInfo Header, int spill ) { int i; static char buf; FILE f;

i = strlen(Header->DsPath) + strlen(Header->SetName) * 2 + 20; // + 13

or 14 at least if((buf = (char *)malloc((size_t)i)) == NULL) allocfailed("buf"); if(spill > 0) { sprintf(buf, "%s/%s.ds/%s.%d_meg4", Header->DsPath, Header->SetName, Header->SetName, spill); } else { sprintf(buf, "%s/%s.ds/%s.meg4", Header->DsPath, Header->SetName, Header->SetName); } f = fopen(buf, "r"); free(buf); return f; }

void GetDsData( HeaderInfo Header, // header information int e, // epoch index int m, // channel index double scale, // integer->Tesla conversion factor double Signal // Signal[T] -- one channel/epoch of time series ) { int32_t i; // sample value int t; // sample index off_t offset; // number of bytes offset static int spill; // 0 for .meg4, 1 for .1_meg4, etc. static int32_t xint; // xint[T] -- raw time series buffer FILE meg4File; // data file pointer static int M = 0; // number of channels static int T; // number of samples per trial

// initialize
if(M == 0) {
    M = Header->NumChannels;
    T = Header->MaxSamples;
    if((xint = (int32_t *)malloc((size_t)T * sizeof(int32_t))) == NULL)
        allocfailed("xint");
}

// initial conditions
spill = 0;
if(Header->spillt) {
    spill = e / Header->spillt;
    e = e % Header->spillt;
}

// open file
if((meg4File = OpenMeg4File(Header, spill)) == NULL)
    cleanup("'meg4File' open failed");

// offset data to epoch & read data
if(Signal != NULL) {
    offset = (off_t)((T * 4) * (m + e * M) + 8);
    if(fseek(meg4File, offset, SEEK_SET) != 0)
        cleanup("'meg4File' fseek failed");
    if(fread((void *)xint, sizeof(int32_t), T, meg4File) != T)
        cleanup("'meg4File' read failed");
    for(t=0; t<T; t++) {
        i = bswap_32(xint[t]);
        Signal[t] = scale * (double)i;
    }
}

// done
fclose(meg4File);

}

On Tue, Aug 6, 2019 at 8:28 AM Tom Holroyd hyperbolic.tom@gmail.com wrote:

The CTF tools are a mix of old 32-bit and 64-bit libraries, so they are annoying to install. They run under CentOS 6, although by installing a bunch of libraries you can get them to work on Ubuntu. I have Singularity containers for both. Here is a Singularity container with a CentOS 6 environment (which runs on any recent Linux distro). You'll need Singularity version 3.

https://kurage.nimh.nih.gov/tomh/ctf.sif

singularity shell ctf.sif

newDs -marker stim -time -1 1 old.ds new.ds

for example will create an epoched dataset with one trial of 2 sec around each stim mark. preTrig will be 1 sec.

This will allow you to cut a multi-file dataset (one with "spill" files) into one smaller dataset. See also newDs -help

On Tue, Aug 6, 2019 at 3:23 AM Joan Massich notifications@github.com wrote:

@jasmainak https://github.com/jasmainak suggested using newDS. Does anyone know how to install it or where is the repo and the doc?

cc:@hyperbolicTom https://github.com/hyperbolicTom

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/6482?email_source=notifications&email_token=ABMKERXWMPSO5FS6XHF5J33QDERGVA5CNFSM4H2ZFGBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3UFIXY#issuecomment-518542431, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMKERW6NJTJVYSQPNG4FSLQDERGVANCNFSM4H2ZFGBA .

-- Conjunction is left-adjoint to implication.

-- Conjunction is left-adjoint to implication.

massich commented 5 years ago

@britta-wstnr pointed out that there's a CTF writing function in Fieldtrip: writeCTFds.m and the ctf-toolbox package.

I think that modifying this constant in fieldtrip and saving should make the trick:

https://github.com/fieldtrip/fieldtrip/blob/bd03aeb2a269ba4cb701d74d6694da3c5ea92d03/external/ctf/writeCTFds.m#L105

hyperbolicTom commented 5 years ago

Pyctf has res4 writer too https://github.com/hyperbolicTom/pyctf/blob/master/pyctf/ctf_res4.py

On Tue, Aug 6, 2019, 11:57 AM Joan Massich notifications@github.com wrote:

@britta-wstnr https://github.com/britta-wstnr pointed out that there's a CTF writing function in Fieldtrip: writeCTFds.m https://github.com/fieldtrip/fieldtrip/blob/master/external/ctf/writeCTFds.m and the ctf-toolbox https://gitlab.com/darren.weber/ctf-toolbox package.

I think that modifying this constant in fieldtrip and saving should make the trick:

https://github.com/fieldtrip/fieldtrip/blob/bd03aeb2a269ba4cb701d74d6694da3c5ea92d03/external/ctf/writeCTFds.m#L105

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/6482?email_source=notifications&email_token=ABMKERUTBCV7DVQG7IJCAGTQDGNQHA5CNFSM4H2ZFGBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3VTXCI#issuecomment-518732681, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMKERQ3JH42FNQV2DOW45LQDGNQHANCNFSM4H2ZFGBA .

massich commented 5 years ago

I didn't manage to install pyctf.

Can pyctf write split files of arbitrary short lenght? Could you split the file for us?

hyperbolicTom commented 5 years ago

I am not sure, normally it creates spill files when it gets over 2 GB ... I guess it depends on who's reading it.

Oh, I saw your issue with pyctf just now. Sorry I didn't notice it earlier. Normally you just "make; make install' to install it into local directories inside pyctf (see config/Makefile.config) but you can also "make symlinks" and it will add links to ~/bin and ~/lib. You could just set LIBDIR and BINDIR in config/Makefile.config to wherever (like /usr/local/{lib,bin})

Then you also need to export PYTHONPATH=~/lib or wherever you installed it.

I can try to do it. Where can I get the dataset? Also why are you trying to create spill files? For testing? What exactly do you want to do?

massich commented 5 years ago

@hyperbolicTom the entire goal is to create a split file for testing purposes. (Therefore with all the files together they should weight very little). My idea https://github.com/mne-tools/mne-python/issues/6482#issuecomment-518305845 was to use the brainvision phantom CTF file to create it. This Phantom file is already shipped by mne. Something like this (adapted) should do:

origin = op.join(
        brainstorm.bst_phantom_ctf.data_path(download=True),
        'phantom_20uA_20150603_03.ds'
    )
dest = op.join('/tmp/phantom_20uA_20150603_03_split.ds')

pyctf_write_method(orig, dest, max_meg4_size='9Mb')