Closed adarshp closed 2 years ago
Re: Updates: aurora notifies the user of any available software updates when started, and we did not receive any notifications when we opened aurora in the past month or so.
I have reached out to NIRx support with the logfiles and a broad problem description. Please feel free to add anything that might be helpful to the conversation here or to the email.
Update 04/05 (Chinmai, Eric, Vatsav, Rick & Valeria) NIRx suggests running an activity monitor, (which we already did last week and we didn't see any issues with CPU/memory) We are running this now in the following ways:
Alternative ways to treat "symptoms", not the underlying cause:
Caleb ran an overnight test yesterday (running only aurora) and one of the three crashed after 3.5 hrs, but it is unclear if that's a battery failure or a device failure
update 04/06: it ran no problem for ~24hrs on Wifi. However, still need to work out if that's due to USB vs wifi or the load on the iMACs. will set up another over night test today.
update 04/07: aurora ran no problem for ~24hrs on USB. Thus, it seems that it might be a problem with the buffer on the iMACs, meaning running things in addition to aurora (eyetracking, mumble, screen recording, face recording, minecraft, firefox, and/or baseline) likely causes the issue.
Thanks! (And thanks for the follow up with NRX).
One next step from our end would be to get a better understanding of how much of the resources each of those processes are using, so we can work on reducing the load. Some things could potentially be moved off of the iMac. Others could potentially be trimmed.
In addition, Eric suggested trying giving Aurora priority, and perhaps others less priority. This can be done with a script and the "nice" command once things are all up and running. I do not have a sense of how likely this is to help because it is not clear what about resource contention is causing Aurora to crash. Most likely it is memory not CPU.
Update: Aurora ran no problem for 2.75 hours on USB while streaming fNIRS, eye-tracking, screen, and webcam data and while playing Minecraft for 2 hours.
Solution: finding Aurora process id (PID) and use renice
command, for example, sudo renice -10 1234
with PID=1234, to increase the priority of Aurora on the iMacs might have solved the buffer issue.
Awesome! Also, Chinmai and I were discussing using an external webcams that go to CAT via USB to reduce load and give us more resolution options. Either way, we are probably collecting face data with excessive resolution. Its seems it is 1/2 of the data volume collected.
Streaming all data - eyetracking, fNIRS, EEG, screen, and webcam - seems to only costed 45% of total CPU power. We still have 55% CPU left for baseline tasks and Minecraft, which should not take much CPU time. Stress test on Tuesday, April 12 will give us a detailed report of percentage of CPU usage of the applications.
ffmpeg still takes the most amount of CPU, at least 30%.
OK, that sounds great. I will look forward to the results of that test.
We should be careful about capturing data at higher resolution than we need, as long term storage could become an issue. I think we budgeted for file servers for around this phase of the project for this reason, but keeping each experiment to more like 100GB instead of 300GB would be good.
Also, I believe we are storing some data for later transfer that ideally should be transferred in real time eventually.
Anyway, if we want to go for external web cams, we probably should go over the ethernet, as USB cables need to be short for high bandwidth. So we would want to get some kind of an adaptor or (better) a native ethernet web cam.
If none of that works, then we could use standard web cams into the usb-C on the iMacs.
Since we have about 50% of CPU power for baseline tasks and Minecraft, and the network speed seems to not be the problem with the current setup, I think we don't need external cameras.
However, if we do want to remove the 30% CPU time of ffmpeg (which I don't think is the case at the moment), plugging the webcam to the iMac does not remove ffmpeg from the iMacs, since we still need to extract the images from the webcam and send it to CAT, which is what we are currently doing with the default camera on the iMac. If we plan to use external cameras, then they should not be plugged into the iMacs.
This is basically correct. Plugging an external camera into the iMac would only help if you wanted to capture frames at a lower resolution than the iMac camera does. This would reduce, but not eliminate, the load, and also reduce the network use and storage (same as it would if they were fed into Cat). However, storage can be reduced after the fact. Anyway, let's see how far we get with restricting the time the camera is on to when the experiment is actually running as you have already suggested.
April 12, 2022 update: Aurora did not crash during the testing session for about 2 hours with priority set to -10
Streaming EEG, fNIRS, and eye-tracking take 25-30% of CPU. Streaming EEG, fNIRS, eye-tracking webcam, screen, and mumble while playing Minecraft takes 45-55% of CPU.
aurora crashed in today's pilot despite reassigned priority.
Can one of the CS students take a look on the iMac where it crashed by opening up the "console" app and seeing if there is any record in any of the logs and reports that you can investigate by clicking items in the left side menu. Thanks!
We found that Aurora crashed after launching Minecraft clients on the iMacs. The issue might be that launching Minecraft client took most of CPU time, causing buffer flow problem in Aurora. See #302 for further discussion.
OK, so that should be reproducible, even if it does not always happen. But I gather starting it up again after such a crash works fine. Can we confirm this?
Also, does aurora tell the log anything, or does the process simply die and the OS reports that it has exited. If so, is there an exit code? A linux based OS always know the exit code of its failed children, and there are different codes for seg faults, etc.
Issue #302 shows that Aurora did not crash only when launching Minecraft. The hypothesis is that there are processes that take a way CPU time from Aurora, causing Aurora hardware to crash due to overflow buffer.
We will still test Aurora on the Mac laptops as alternative solution to #307
Aurora did not crash when running on the laptops, except for when one of the laptop went to sleep, which paused Aurora and caused buffer overflow.
Hi @adarshp & @kobus-barnard,
@eduongAZ, @kay-of-a, @rchamplin and I, observed that Aroura on leopard doesn't usually crash, so we decided to test if hardware is the issue or iMac/MacOs is the issue. I swapped fNIRS device between tiger and leopard, had it running for couple of minutes and then executed Minecraft on the iMacs then all of them crashed.
I swapped back the devices to its original place and performed the same test again. This time lion and leopard crashed.
It's hard to come up a conclusion with such random behavior.
I recently observed this device had one of its LED red. @rchamplin could you ask NIRx folks what this means?
It's an Error message and you need to restart the device. This is described in the "getting started" guide (paper) in the lab. I agree, the behavior is very random indeed!
One thing to test out is the setting a larger scale on the Aroura visualizer. @rchamplin and I observed when we increase the scale and increase the time window for plotting Aroura "generally" doesn't crash.
Update: NIRx is shipping us a Windows machine that they have tested, hopefully this will work. We will reopen the issue if it does not.
The fNIRS recording program (Aurora) crashes randomly sometimes. This is not good.
We will look at the log files, contact NIRx (we need a detailed bug report).
We thought it might be related to a buffer problem with the high video streaming load, but that has been ruled out now since we reduced the face and screen capture resolution.
Kobus:
Eric: macOS might be freezing Aurora to save on resources. We should look into whether this is the case (and prevent it from doing that if so)