uwmadison-chm / bioread

Utilities to work with files from BIOPAC's AcqKnowlege software
MIT License
65 stars 23 forks source link

Could reading in of markers be added? #2

Closed gchadder3 closed 8 years ago

gchadder3 commented 10 years ago

Hi,

I'm using bioread to read in experimental data collected from Biopac for stress classification. Your package works great for reading in the channel data itself, but the one thing that seems to be missing is reading in of the markers. (I looked in readers.py, but didn't see code to pull these out.)

We use the markers to record when trial phases of our experiments begin and currently to extract that information from the acq file, we have to put it into a journal using the Biopac software, and then copy the journal text into a Notepad file as a first step for our data munging. So, it would be useful if we could pull out the markers times and labels directly from the acq files to avoid this manual step.

Regards,

George Chadderdon Design Interactive, Inc.

njvack commented 10 years ago

Hm. It should be possible; it looks like the mark data is stored directly after the end of the channel data. Can you send a (hopefully small) file with marks?

gchadder3 commented 10 years ago

Hi Nathan,

I'm attaching a downsampled version of one of our data files. (It's still over 1 MB, but hopefully this one won't bounce.)

Thanks so much!

George

On Thu, Feb 20, 2014 at 1:57 PM, Nate Vack notifications@github.com wrote:

Hm. It should be possible; it looks like the mark data is stored directly after the end of the channel data. Can you send a (hopefully small) file with marks?

Reply to this email directly or view it on GitHubhttps://github.com/njvack/bioread/issues/2#issuecomment-35655952 .

njvack commented 10 years ago

Hm, no dice. Can you send it to njvack@wisc.edu?

njvack commented 10 years ago

Got it; thanks. I'm gonna take a look at this; however, if it's hard it may be a while; I have other projects that need attention right now.

If you want to take a stab at it, Application Note 156 gives some hints about marks that may or may not still be connected to reality.

gchadder3 commented 10 years ago

Sure, no problem. It's not a mission-critical item on my end, just a feature that will be nice to have when it gets added (will save us a manual step in our data analysis).

Thanks again,

George

On Thu, Feb 20, 2014 at 3:48 PM, Nate Vack notifications@github.com wrote:

Got it; thanks. I'm gonna take a look at this; however, if it's hard it may be a while; I have other projects that need attention right now.

If you want to take a stab at it, Application Note 156http://www.biopac.com/Manuals/app_pdf/app156.pdfgives some hints about marks that may or may not still be connected to reality.

Reply to this email directly or view it on GitHubhttps://github.com/njvack/bioread/issues/2#issuecomment-35667058 .

njvack commented 10 years ago

Sorry to jump between mail and github -- would it be worthwhile for channels to take marks as slicing operands, so you could do something like:

start_mark = data.marks[0]
end_mark = data.marks[1]
ch = data.channels[0]
interesting_data = ch[start_mark:end_mark]

?

gchadder3 commented 10 years ago

Yeah, that sounds like a great idea to me. -- G.

On Thu, Feb 20, 2014 at 4:58 PM, Nate Vack notifications@github.com wrote:

Sorry to jump between mail and github -- would it be worthwhile for channels to take marks as slicing operands, so you could do something like:

start_mark = data.marks[0] end_mark = data.marks[1] ch = data.channels[0] interesting_data = ch[start_mark:end_mark]

?

Reply to this email directly or view it on GitHubhttps://github.com/njvack/bioread/issues/2#issuecomment-35674258 .

njvack commented 10 years ago

OK, I've looked at this a bit more and don't think I can write this right now. I can hunt around in a hex editor and find the channels (they're right after the data) and figure out what's in them, but I can't figure out how to find where they start perfectly.

There may be a bug in reading the last few samples in a file (I've saved a file as two versions and get different values for the last 8 samples) so it's possible I'm somehow not reading all the way to the end of the file, though this seems unlikely. Or it could be that acqknowledge did something funny... which would be surprising but not hugely so.

And then in the tests I've done by adding markers to some sample files, there's something more complicated that can happen sometimes that doesn't make much sense to me. It's like there can be some kind of section appending and then more marks but hm. It's odd.

In any case, making this work for more than just your sample file is going to take longer than I have right now.

Sorry!

gchadder3 commented 10 years ago

No problem. I can just use bioread to pull out the signals themselves until you have time to add the new feature. There are only 20 or so more subjects I'll have to do the manual journal step in, anyway, in the near future.

Thanks,

George.

On Fri, Feb 21, 2014 at 11:35 AM, Nate Vack notifications@github.comwrote:

OK, I've looked at this a bit more and don't think I can write this right now. I can hunt around in a hex editor and find the channels (they're right after the data) and figure out what's in them, but I can't figure out how to find where they start perfectly.

There may be a bug in reading the last few samples in a file (I've saved a file as two versions and get different values for the last 8 samples) so it's possible I'm somehow not reading all the way to the end of the file, though this seems unlikely. Or it could be that acqknowledge did something funny... which would be surprising but not hugely so.

And then in the tests I've done by adding markers to some sample files, there's something more complicated that can happen sometimes that doesn't make much sense to me. It's like there can be some kind of section appending and then more marks but hm. It's odd.

In any case, making this work for more than just your sample file is going to take longer than I have right now.

Sorry!

Reply to this email directly or view it on GitHubhttps://github.com/njvack/bioread/issues/2#issuecomment-35747392 .

dgfitch commented 8 years ago

I think @njvack's suspicion is correct that bioread is not quite getting the end of the file parsed right. Which makes this harder/impossible.

Once #3 is fixed with some tests behind it, this might get simpler.

My fork (coming soon to a pull request near you) adds related tests, including a test file with a marker.

dgfitch commented 8 years ago

It looks like the newer formats definitely have non-documented additions to the marker structure.

The marker header looks like this in 4.2.0:

long (4): marker header total length
long (4): marker count

Each marker in 4.2.0 is:

long (4): marker location in samples, may start at 1 instead of 0
short (2): ??? does not look like length of the rest of the marker
bool (2) * 4: ???
short (2): ??? some kind of enum
string (4): NON-null terminated string representing type of marker, like "apnd" for segments, "flag" for the default markers, "defl" for default (?)
long (4): marker name length
string: marker name, null-terminated
long (4): usually zero?? could be 2 bools

Going to need more samples to suss out the version 3 header, it's simpler but doesn't match up to the notes. Pretty sure the header is

short (2): marker header total length
long (4): marker count

(I really don't understand why the marker count would ever need to be bigger than the total length, but okay.)

And each marker is something like

long (4): marker location, looks like it starts at 0 and is reversed endian from the v4???
something: possibly of variable length :(
long (4): marker name length
??? (2): unknown
string: marker name, null-terminated
njvack commented 8 years ago

OK! Despite the fact that we're not reading to the end of the data quite correctly, there's enough information in the headers to go right out there and find the markers. So this is now a thing.

The thing I haven't added is slicing channels based on markers; however, you can get the sample number with marker.sample_index. This is in the file's base sampling rate -- so the numbers may be unexpected if you're looking at a channel with a frequency divider. Doing it better would be a bit complicated and kind of warrants a refactoring of a bunch of stuff I'm not going to refactor right now.

Anyhow! This is closed by 0e6e573b099bc6ded8325b8325bff438f3221ca7.