openyou / emokit

Open source driver for accessing raw data from the Emotiv EPOC EEG headset
http://www.openyou.org
Other
521 stars 235 forks source link

Issues #213

Closed ghost closed 7 years ago

ghost commented 7 years ago

Lots of stuff updated here.

Might resolve #211

Might resolve #209

Might resolve #207

Might resolve #205

Might resolve #204

ghost commented 7 years ago

I'll pull this in after I have some more time to test.

Morgan243 commented 7 years ago

Nice, a lot of solid changes here.

Regarding output to a CSV - I propose we drop the use of Python's built-in CSV writer and simply write the values directly to a file. The reason is that there is a lot of overhead in the CSV handler since it's aim is to be flexible and easy to use quickly. This is not our use case since we know what the data will look like and the overall format in advance. You can google around and find a few stack overflow posts that discuss this performance, but I wanted to compare EmotivWriter with these others to see just how big of an impact that wrapper has. Below are the results of running some performance test code I wrote on my machine:

Data shape is (38400, 20)
--original_emotiv
Avg time: 0.2251s (stddev=0.0029s)
Samples per sec: 170591.1190s
--python_csv
Avg time: 0.1829s (stddev=0.0013s)
Samples per sec: 209962.6777s 
--direct_to_fs
Avg time: 0.0374s (stddev=0.0003s)
Samples per sec: 1026835.9575s 
--direct_to_fs_chunked
Avg time: 0.0297s (stddev=0.0004s)
Samples per sec: 1295064.5118s 

It's pretty clear that we have some significant overhead here - but it's important to understand that multithreading this writer will probably not remove all of this overhead from the main thread due to Python's GIL. The roughly 6-7x overhead I demonstrated above is primarily made up of time spent executing Python code - not writing to the file. If that thread is executing Python code, then no other thread can execute Python code (see: GIL), so while multi-threading will help, we still need the writer to minimize it's need for user-space CPU time.

I had planned on implementing and submitting a PR for this, hence the performance test .

Edit: Removed a few absolutes :P Edit2: Placing quotes around everything is also not insignificant, it adds a quite a bit more data to each write. If someone needs full quoting, it makes more sense to just process the CSV after recording the data

Morgan243 commented 7 years ago

If we still have issues with performance writing to a CSV, we can launch the writer in a separate process and queue/pipe packets over.

ghost commented 7 years ago

Sounds good to me.

ghost commented 7 years ago

Yeah I thought about that as well, however I want to stream the data out of a TCP/UDP server at some point. The threading seems to keep up fine, one of the things that was really slowing down the export was the writing to the console in the main thread which was skewing the sample rate also.

Of course I'm doing all this on a MacBook Pro so it might not be running good on other platforms.

ghost commented 7 years ago

On python2 it's very consistent(with my system), I think with your suggestion Python3 will work good as well.

ghost commented 7 years ago

Of course keeping the data in a buffer until the end of the session might work if we catch Keyboard Interrupt and set a flag for the writer to flush it to disk when exiting.

Morgan243 commented 7 years ago

I'm not sure that buffering everything in memory will buy us much for the memory pressure it will induce and the unease of having the data in volatile storage. By chunking the data, we can effectively compromise between the two extremes of write-every-record and buffer-every-record.

ghost commented 7 years ago

So the python3 performance seemed to stem from the GIL like you were saying, fixed the sleep times and it performs the same if not better now. Also, writing directly to the file now.

ghost commented 7 years ago

Yeah chunking would be good too I suppose.