ctf2fif conversion problem

romquentin commented 8 years ago

Hi guys, I am just starting to use mne-python on CTF data. I used the C tool mne_ctf2fif to convert the data ('mne_ctf2fif --ds data.ds/ --fif data01') I have a .ds folder with the .meg4 file (ctf) of 1.11go and the resulting .fif file is only around 200mo. When I looked at the events, there is indeed more than three quarter of events missing. Nevertheless, it seems to work with smaller files (around 300mo). Is there a size limit? I am running under Apple el capitan on a macbookpro. Thanks a lot!

larsoner commented 8 years ago

Do you mean the file is 1.11 GB and the resulting FIF file is 200 MB? I'm not sure what you mean by "mo" and "go" units...?

FIF does have a size limit, but it can be overcome by writing split files. I'm not sure if the C tools will do this automatically, but mne-python should now that we also have CTF reading capabilities.

mshamalainen commented 8 years ago

The C tools do not do the splitting.

On Dec 14, 2015, at 4:22 PM, Eric Larson notifications@github.com wrote:

Do you mean the file is 1.11 GB and the resulting FIF file is 200 MB? I'm not sure what you mean by "mo" and "go" units...?

FIF does have a size limit, but it can be overcome by writing split files. I'm not sure if the C tools will do this automatically, but mne-python should now that we also have CTF reading capabilities.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2698#issuecomment-164449834.

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

larsoner commented 8 years ago

Ahh, thanks @mshamalainen. @romquentin can you try using mne-python to read the raw file? You need to install latest master and then do something like:

raw = mne.io.read_raw_ctf(directory_name)
raw.save('test_raw.fif')

mshamalainen commented 8 years ago

If I remember correctly, the CTF raw data files also fit in the 2GB limitation, maybe a few samples omitted from the end.

On Dec 14, 2015, at 4:27 PM, Eric Larson notifications@github.com wrote:

Ahh, thanks @mshamalainen https://github.com/mshamalainen. @romquentin https://github.com/romquentin can you try using mne-python to read the raw file? You need to install latest master and then do something like:

raw = mne.io.read_raw_ctf(directory_name) raw.save('test_raw.fif') — Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2698#issuecomment-164450856.

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

romquentin commented 8 years ago

Thank you for your quick response. Yes, I was talking about GB sorry. I tried mne-python to read the file with this code:

import mne from mne.io import Raw raw=mne.io.read_raw_ctf('rom_wmtask_20151203_01.ds') raw.save('test_raw.fif')

and I got this error:

ValueError Traceback (most recent call last)

in () ----> 1 raw.save('test_raw.fif') /Users/romain/Downloads/src/mne/mne/io/base.py in save(self, fname, picks, tmin, tmax, buffer_size_sec, drop_small_buffer, proj, fmt, overwrite, split_size, verbose) /Users/romain/Downloads/src/mne/mne/utils.py in verbose(function, _args, *_kwargs) 549 finally: 550 set_log_level(old_level) --> 551 return function(_args, *_kwargs) 552 553 /Users/romain/Downloads/src/mne/mne/io/base.py in save(self, fname, picks, tmin, tmax, buffer_size_sec, drop_small_buffer, proj, fmt, overwrite, split_size, verbose) 1275 _write_raw(fname, self, info, picks, fmt, data_type, reset_range, 1276 start, stop, buffer_size, projector, inv_comp, -> 1277 drop_small_buffer, split_size, 0, None) 1278 1279 def plot(self, events=None, duration=10.0, start=0.0, n_channels=20, /Users/romain/Downloads/src/mne/mne/io/base.py in _write_raw(fname, raw, info, picks, fmt, data_type, reset_range, start, stop, buffer_size, projector, inv_comp, drop_small_buffer, split_size, part_idx, prev_fname) 2066 2067 if picks is None: -> 2068 data, times = raw[:, first:last] 2069 else: 2070 data, times = raw[picks, first:last] /Users/romain/Downloads/src/mne/mne/io/base.py in **getitem**(self, item) 546 data = self._read_segment(start=start, stop=stop, sel=sel, 547 projector=self._projector, --> 548 verbose=self.verbose) 549 times = self.times[start:stop] 550 return data, times /Users/romain/Downloads/src/mne/mne/io/base.py in _read_segment(self, start, stop, sel, data_buffer, projector, verbose) 383 self._read_segment_file(data[:, this_sl], idx, fi, 384 int(start_file), int(stop_file), --> 385 cals, mult) 386 offset += n_read 387 /Users/romain/Downloads/src/mne/mne/io/ctf/ctf.py in _read_segment_file(self, data, idx, fi, start, stop, cals, mult) /Users/romain/Downloads/src/mne/mne/utils.py in verbose(function, _args, *_kwargs) 549 finally: 550 set_log_level(old_level) --> 551 return function(_args, *_kwargs) 552 553 /Users/romain/Downloads/src/mne/mne/io/ctf/ctf.py in _read_segment_file(self, data, idx, fi, start, stop, cals, mult) 129 this_data = np.fromstring( 130 fid.read(si['n_chan'] \* n_read \* 4), '>i4') --> 131 this_data.shape = (si['n_chan'], n_read) 132 this_data = this_data[:, r_lims[bi, 0]:r_lims[bi, 1]] 133 data_view = data[:, d_lims[bi, 0]:d_lims[bi, 1]] ## ValueError: total size of new array must be unchanged Thanks a lot for your feedback. I can send you a link toward my ctf folder if you want to test.

larsoner commented 8 years ago

I suspect your data file is truncated. Can you check? If you think it's not truncated, feel free to share a link and I'll take a look.

larsoner commented 8 years ago

Actually, even if it is truncated, please share a link. I'd like to add a saner error message.

romquentin commented 8 years ago

I am uploading the folder in dropbox. I will send you the link when it is done (slow internet connection). I guess it is truncated, we checked the head position several time during the block. What I can do if this is the case? Thank you!

larsoner commented 8 years ago

If it is indeed truncated, I can make it so it outputs as much good data as there is, and warns that there is less than expected. That's probably better than throwing an error. My guess is that's what Matti's C code has already done, though. I'll have to take a look to see.

romquentin commented 8 years ago

Sorry for the time, it was not possible with my connection yesterday... Here is a link for the ctf data: https://www.dropbox.com/s/j1z6xuvfqyhwjhh/rom_wmtask_20151203_01.ds.zip?dl=0

larsoner commented 8 years ago

Great, I'll see what I can figure out

larsoner commented 8 years ago

It looks like your system clock channel goes to zero early on in the recording, indicating that the recording has ended (?). At least that's how the conversion routine treats the data. Does that make sense?

larsoner commented 8 years ago

In other words, the system clock channel is nonzero for the first 158400 samples, then goes to zero, so we take that to mean the rest of the recording (737730 samples) is invalid.

larsoner commented 8 years ago

We could easily add a ignore_clock option to the mne-python reader. But I don't know if it's the right thing to do or not. @mshamalainen any insight on how or why the CTF system clock channel could drop to zero after 158400 samples, leaving 737730 samples from the data file unused?

larsoner commented 8 years ago

Or @kingjr maybe you know about how the system clock channel works?

romquentin commented 8 years ago

Thank you for your response. We asked 4 or 5 times for the estimation of the head position during this same long block. Maybe when we do that, it stopped the acquisition and started it again after the head position estimation. Does it seem possible? In this case, I guess just ignore clock should work?

larsoner commented 8 years ago

Ahh, I see. I guess eventually we could have a mode where it ignored those sections, but it would create discontinuities in the data so it's probably not a good idea. system_clock='ignore' | 'truncate' (default) as an option makes sense to me, I'll go ahead and add it.

kingjr commented 8 years ago

Or @kingjr maybe you know about how the system clock channel works?

No, sorry.

@romquentin Do you have multiple file for which you encountered this issue? Just to check whether the clock reset is systematic after a particular size, or whether it corresponds to the head position estimations.

My guess is that when doing HPI, the recording stops because the sensors values go to different order of magnitude, and thus need a different setup.

dgwakeman commented 8 years ago

, but then what hpi will you use in the file @Eric89GXL? Wouldn't it be better to just chop it into different files at each of those spots, because the user wanted that HPI info?

kingjr commented 8 years ago

it would create discontinuities in the data so it's probably not a good idea

+1. How about outputting multiple fiff files?

kingjr commented 8 years ago

I'm also sure whether the default should be to associate the HPI that preceeds the recording, or follows it, as both scenarios could come up.

romquentin commented 8 years ago

I encountered the same issue for relative big file (>800MB). But during the recording of these files, we always did HPI during the acquisition.

larsoner commented 8 years ago

Yeah, we could split the data into multiple FIF files. There are (at least) a few problems with doing that right now, though:

The code will need to be refactored quite a bit to get this to work.
I'm not actually sure how to get the new HPI information out for each run.
As @kingjr suggests they could want to actually use the start or end (or average?) head position, so we might need to add options for that (and know what the ramifications are).

Perhaps most importantly, I'm not familiar enough with the use cases for this data to really shape how the API and behavior should work in these situations.

Of course none of these issues are insurmountable, so this is just to say it's not a trivial change to make. Adding the 'ignore' option, on the other hand, is almost trivial change. With that option added, users can experiment with the .crop() method and set info['dev_head_t'] manually if they want or need to, and we can add a 'split' option later once someone comes up with a clean proposal for how to sort out these issues, and someone gets sufficiently motivated to implement the changes.

larsoner commented 8 years ago

@romquentin the 'ignore' option allows me to read all samples from your file and save it to disk now. Feel free to try the branch from my PR #2712 if you know how to do it with git, or just wait for the code to land in master in the next day or two.

kingjr commented 8 years ago

+1 for leaving the issue open and adding a quick ignore option.

In the meantime @romquentin, you should stop and save your next recordings before you do the HPI.

romquentin commented 8 years ago

@Eric89GXL, Thank you for the quick option! @kingjr, got it!

larsoner commented 8 years ago

@kingjr I'm actually going to close this issue with my PR, since the description and title don't really match the new issue (dealing with multiple HPI readings in CTF data). I think it would be better to open a separate issue for it when someone hits that use case and has some ideas for how to proceed with analysis. If you already have some ideas and will need the functionality (or you @romquentin), feel free to open a different issue.

kingjr commented 8 years ago

sure

mne-tools / mne-python

ctf2fif conversion problem #2698

and I got this error: