read_raw for INTAN .dat files

marionducret commented 1 year ago

Describe the new feature or enhancement

Hello,

Is there a way to implement a mne.io.read_raw_intan() function for .dat files from INTAN ?

Describe your proposed implementation

In the same way than all other io.read_raw() functions

Describe possible alternatives

I have no idea

Additional context

No response

welcome[bot] commented 1 year ago

Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴

drammock commented 1 year ago

It looks like the format is very well described here: https://intantech.com/files/Intan_RHD2000_data_file_formats.pdf

@marionducret can you confirm that you are working with INTAN's "one file per signal type" format, described starting on page 6 of that document? I.e., do you have files like time.dat, amplifier.dat, auxiliary.dat etc? Or is it "one file per channel" format as described starting on page 10 ?

note to other devs: there are some .zip files here for python file readers / converters, I haven't yet looked into what they do or how they're licensed (other than noting that none of them say on the website that they read .dat format).

marionducret commented 1 year ago

Hi @drammock, No I am working with the "one file per channel" format with all data in ".dat"

drammock commented 1 year ago

OK thanks. Writing a new file format reader can be a lot of work, so unless you have time and inclination to implement it yourself based on the INTAN file specification, my focus for now is trying to find a way to read in the data somehow and then use our RawArray class to convert the data into an MNE object.

(aside: I noticed that the NEO package has an INTAN reader but it only supports .rhd and .rhs formats, so I don't think that will help us unless there's a way to convert "one-file-per-channel" .dat files into .rhd or .rhs format.)

tagging here @adrian-foy: do you have any advice on getting INTAN's "one-file-per-channel" .dat files into Python?

adrian-foy commented 12 months ago

Hi @drammock , it should be pretty straightforward to get One File Per Channel .dat files into Python with just a few lines of code.

https://intantech.com/files/Intan_RHD2000_data_file_formats.pdf includes snippets of how to load data from these files into MATLAB - for example, if they're amplifier channel files, Pg. 11 shows exactly how to do this. All it would take is adapting the MATLAB syntax to Python, and you could do the same exact thing.

A quick note - if the files are large enough (regardless of Python vs. MATLAB), loading all the data from specified file(s) could become taxing on memory. If high sample rate data from multiple channels over several days is being loaded at once, it might be necessary to only load smaller chunks at a time, otherwise Python may run out of memory. So some data size checking might be worth including if large data files are expected.

drammock commented 12 months ago

[the PDF file] includes snippets of how to load data from these files into MATLAB

ah, my apologies, I was scanning it quickly and didn't notice the code example. Yes, it should be straightforward, thanks @adrian-foy!

@marionducret this should get you started:

from pathlib import Path
import numpy as np
data_folder = Path("path/to/folder/where/data/files/are")  # edit this
data_files = sorted(data_folder.glob("amp-*.dat"))
data = None
for fname in data_files:
    this_channel = np.fromfile(fname, dtype=np.int16) * 0.195 * 1e-6
    if data is None:
        data = this_channel
    else:
        data = np.vstack((data, this_channel))

from there you can use mne.io.RawArray and mne.create_info to get this into a Raw object. You'll also need to read the info.rhd file to get sampling rate, channel names, etc. (Note that I'm not really dealing with channel order in the above example, other than alphabetizing by the name of the data file.)

larsoner commented 12 months ago

A quick note - if the files are large enough (regardless of Python vs. MATLAB), loading all the data from specified file(s) could become taxing on memory. If high sample rate data from multiple channels over several days is being loaded at once, it might be necessary to only load smaller chunks at a time, otherwise Python may run out of memory. So some data size checking might be worth including if large data files are expected.

We have interfaces in mne.io.BaseRaw to deal with on-demand reads from disk of chunks of data. The TL;DR is that if you can figure out how to get data from sample i to sample j for all channels, it's not too bad to get on-demand reads "for free" from BaseRaw.

marionducret commented 12 months ago

[the PDF file] includes snippets of how to load data from these files into MATLAB

ah, my apologies, I was scanning it quickly and didn't notice the code example. Yes, it should be straightforward, thanks @adrian-foy!

@marionducret this should get you started:
from pathlib import Path
import numpy as np
data_folder = Path("path/to/folder/where/data/files/are")  # edit this
data_files = sorted(data_folder.glob("amp-*.dat"))
data = None
for fname in data_files:
    this_channel = np.fromfile(fname, dtype=np.int16) * 0.195 * 1e-6
    if data is None:
        data = this_channel
    else:
        data = np.vstack((data, this_channel))
from there you can use mne.io.RawArray and mne.create_info to get this into a Raw object. You'll also need to read the info.rhd file to get sampling rate, channel names, etc. (Note that I'm not really dealing with channel order in the above example, other than alphabetizing by the name of the data file.)

Cool thank you so much !

mne-tools / mne-python