robotpy / robotpy-wpilib

Moved to https://github.com/robotpy/mostrobotpy
https://robotpy.github.io
Other
169 stars 59 forks source link

GearsBot example doesn't work with pyfrc generic tests #116

Closed virtuald closed 8 years ago

james-ward commented 8 years ago

The hanging on the built in tests is due to an issue with threading in the test harness.

I spent a day trying to fix it, and couldn't come up with a solution. I think someone more familiar with the test harness might have to do it.

To "fix" the problem (ie get the tests to run), I created a counter in the TimerTask class, and set the stopped flag after an arbitrary number of iterations (eg 5000000). The thread terminates and the tests complete. So the issue is that the test harness isn't able to stop the threads properly, even though my inspection of the code indicates that it should be cancelling the TimerTasks. I'm happy to work further on this if someone could give me a suggested line of attack.

virtuald commented 8 years ago

I think we had done some investigation in this previously, and it seems like https://github.com/robotpy/pyfrc/issues/29 might be related.

james-ward commented 8 years ago

It turns out that if you make the base class of TimerTask a multiprocessing.Process everything runs fine. Is there a reason that a thread was chosen for TimerTasks over a process? Something architecturally or otherwise that I am missing? Would such a change have any (negative) flow on effects to other parts of the code base?

virtuald commented 8 years ago

Well, it runs fine, but does it run correctly? (the builtin tests don't do much other than verifying that you can execute robot code without it crashing)

I'm not too familiar with multiprocessing, but IIRC you can only communicate between processes using things like queues and such that are designed for the job (from the multiprocessing module), so the PIDController and other such things would have to be redesigned to use that. Seems like a bad idea.

I haven't looked too deeply into this since last year, so the details are hazy. If I were to go investigating this, what I probably would do is put log statements that include the time and current thread id all over the timertask code and figure out what's happening in what thread and find the race condition that's freezing it.

From the other issue:

The PIDController uses a TimerTask object which also calls Timer.delay which increments our fake_time timer while running. Since it is running on its own thread, how many calls it makes is indeterminate and if what you wanted to get done in that time is also indeterminate."

If this is the case (which sounds plausible), then probably the right answer is to have a separate 'time' stored in a threadlocal for each thread, and to operate each thread in lockstep with each other so they can't mess with each other's timing, but also execute in a synchronized manner. For some reason I thought we did something like that, but perhaps not.

Probably at least part of the problem is that we attempt to allow 'pausing' and 'stepping' time, which makes the implementation of fake time a bit more complex.

It would be great to get this issue solved, but I personally will not be able to look at it until the season starts, as I'd like to finish the 2016 WPILib changes and some other misc stuff that needs to get done before kickoff.