megakilo / FreeRTOS-Sim

FreeRTOS simulator for POSIX
GNU General Public License v2.0
93 stars 74 forks source link

vPortYield returns sometimes with another FreeRTOS thread active #1

Open pekkanikander opened 8 years ago

pekkanikander commented 8 years ago

When running under Mac OS X 10.11.4, vPortYield() sometimes (rarely) returns with another FreeRTOS thread active (pointed by pxCurrentTCB) than what was active when called. This causes then problems much later, e.g. sometimes the idle thread gets removed from pxReadyTasksLists[0], crashing FreeRTOS.

I don't yet know the root source for the problem nor how to fix it, but the following commit adds a minimal piece of code that prints out a message when this happens. With our code we get most of the time a crash (core dump) shortly afterwards.

https://github.com/pekkanikander/FreeRTOS-Sim/commit/6082316f9a589905c7ea7b307a7f620bf5904062

pekkanikander commented 8 years ago

The following is speculation and may be completely wrong: To me it looks like that when prvSuspendThread is called for the running thread, i.e. pthread_self() == xTaskToSuspend, sometimes returns when either the current thread has not been suspended yet or when it has been resumed prematurely.

In my debugging I've mostly focused on race conditions where the thread could continue before it enters the sigwait in prvSuspendSignalHandler at https://github.com/megakilo/FreeRTOS-Sim/blob/master/Source/portable/GCC/POSIX/port.c#L540 However, that does not seem to be the case. I even copied most of the prvSuspendSignalHandler content to prvSuspendThread itself, blocking the thread in prvSuspendThread instead of the signal handler in the case of suspending the calling thread itself, but that did not help. (Or I did some silly mistake when I was trying this.)

Consequently, I am now suspecting that maybe the thread-to-be-suspended sometimes gets resumed too early, but I have no clue why or how.

megakilo commented 8 years ago

This sounds weird. Were you running the demo tasks or your own code when you encountered the crash? I'm also on OSX 10.11.4 and I haven't seen any crashes running the demo for 10 minutes several times. Would you be able to try yours also on Linux if you are suspecting pthread issues?

pekkanikander commented 8 years ago

Yes, we will be trying this on Linux today. And also try to distill a test case. We are not running the demo tasks but using the FreeRTOS-Sim as a part of our unit testing framework. Hence we get these crashes now with some of our proprietary code integrated. I'll try to make a test case that I can share, but that may take some time.

I spent last night quite long in drilling down to this, as initially we just saw the crashes much later. However, now it is relatively easy to repeat, though takes usually a few minutes to occur.

My suspicion (perhaps wrong) is that several threads calling xQueueGenericReceive on different queues, about at the same time, is part of triggering this problem.

abeck70 commented 6 years ago

@pekkanikander Did you by any chance resolve the FreeRTOS/Posix sim issue? I now see this running OSX 10.13.2. As soon as this occurs "vPortYield yielding to another thread: old = 0x7fe8d75015f0, now = 0x7fe8d75000b0" I experience queue based receive tasks failing to ever get notified of a queued event. They also never time out. Eventually the queues fill up and for example I get vUDPReceiveAndDeliverCallback: queueSendFailed.

Thanks Alex.

pekkanikander commented 6 years ago

Unfortunately I gave up and then months later we decided to move away from FreeRTOS completely.