navit-gps / navit

The open source (GPL v2) turn-by-turn navigation software for many OS
http://www.navit-project.org
Other
550 stars 175 forks source link

Multithreading #760

Open mvglasow opened 5 years ago

mvglasow commented 5 years ago

As far as I can tell, Navit currently does not (or hardly) use multithreading. The standard mechanism for doing things in the background is to run them in an idle loop, on the main thread, with the following implications:

I understand that multithreading is no small thing and needs to be worked out carefully. Specifically, we will need to synchronize accesses to our data structures to avoid data corruption.

However, the first question is: is multithreading available on all platforms we support? Or are there any that do not support multithreading? Even the latter would not be a show stopper, but would need to be accommodated in our design: for example, we could keep the code structure with idle loops but run them in a background thread where supported.

jkoan commented 5 years ago

I did a debug session in eclipse lately and noticed that there where 3 threads running while I was debugging. I only remember that one was Sdl. Because I was debugging the Sdl graphics driver.

mvglasow commented 5 years ago

That appears to be something happening behind the scenes, i.e. triggered by SDL, not Navit, and likely happens with a few platform-specific libraries. Unfortunately that doesn’t help us here—what I meant was: are there any instances where Navit itself (as opposed to some third-party library) launches additional threads?

mvglasow commented 5 years ago

A quick scan of the core reveals references to the gthread (GLib thread) and pthread (POSIX thread) libraries.

gthread is used by a vast number of components, including completely platform-agnostic ones, but all occurrences of ghtread.h I found are in build artifacts. Apparently use of gthread is optional, with a fallback for environment which do not support it. I am also seeing #define G_THREADS_ENABLED somewhere in the code.

pthread is used by graphics/sdl, support/glib and support/espeak, with all occurrences of pthread.h in the source files—all of them being rather platform-specific, though. I have also found occurrences of #if USE_POSIX_THREADS in the source code.

So either of the two appears to be fully optional and we would need to work around its absence, though I would need to figure out if the respective conditionals are available throughout the code. A (somewhat) quick win would be to redesign our idle loops to be able to run in separate threads. We would need to manage concurrent access for all data structures touched by background threads, as well as implementing a decision whether a particular job should run in an idle loop or in a separate thread.

In addition to that, we could extend multithreading support to other platforms. pthread should be available on all Unix-like platforms (i.e. the bulk of our platforms), and a Windows/WinCE port is also available. GLib is also cross-platform, but I am not sure which platforms are covered by gthread.

mvglasow commented 5 years ago

A quick scan of the core reveals references to the gthread (GLib thread) and pthread (POSIX thread) libraries.

gthread is used by a vast number of components, including completely platform-agnostic ones, but all occurrences of ghtread.h I found are in build artifacts. Apparently use of gthread is optional, with a fallback for environment which do not support it. I am also seeing #define G_THREADS_ENABLED somewhere in the code.

pthread is used by graphics/sdl, support/glib and support/espeak, with all occurrences of pthread.h in the source files—all of them being rather platform-specific, though. I have also found occurrences of #if USE_POSIX_THREADS in the source code.

So either of the two appears to be fully optional and we would need to work around its absence, though I would need to figure out if the respective conditionals are available throughout the code. A (somewhat) quick win would be to redesign our idle loops to be able to run in separate threads. We would need to manage concurrent access for all data structures touched by background threads, as well as implementing a decision whether a particular job should run in an idle loop or in a separate thread.

In addition to that, we could extend multithreading support to other platforms. pthread should be available on all Unix-like platforms (i.e. the bulk of our platforms), and a Windows/WinCE port is also available. While GLib is also cross-platform, I am not sure which platforms are covered by gthread, especially since we are doing some extra magic on certain platforms.

mvglasow commented 5 years ago

Ran a quick test on Linux and Android with the following lines inserted into traffic_process_messages_int():

#if USE_POSIX_THREADS
    dbg(lvl_error, "!!! We support POSIX threads. This could be so much faster!");
#endif
#ifdef G_THREADS_ENABLED
    dbg(lvl_error, "!!! We support GLib threads. This could be so much faster!");
#endif

Debug output tells me that GLib threads should be available on both platforms. POSIX threads seem to be available on Android but not on Linux.

mvglasow commented 5 years ago

Windows also seems to support GLib threads. Looks like that is the easiest way to go.

If anybody else feels like testing this on their favorite platform, please feel free to do so and post your results here! Code is at https://github.com/mvglasow/navit/tree/multithreading-test

lains commented 5 years ago

I have a WinCE device and build env so I can run the test on it if needed.

mvglasow commented 5 years ago

@lains Yes please! If sharing personal CircleCI artifacts works, you can use a prebuilt binary from https://circleci.com/gh/mvglasow/navit/1327. If threads are supported, the respective message will appear in the log file over and over (quick and dirty solution).

mvglasow commented 5 years ago

In the meantime I am working on experimental multithreading support. Any use of multithreading functions is bracketed by conditionals, so the code will continue to work as before where multithreading is not available, at most with some refactoring.

Currently the two most processor-hungry items are traffic (whenever a new report comes in and must be matched to the map), and routing (initial route calculation, to an extent also recalculation).

I am starting with traffic, as I know my way around that module best and I originally thought it should be fairly isolated, though I now realize it does have some dependencies on other modules. The thread will just execute traffic_process_messages_int() in a loop (going to sleep if there is nothing to do), hence the changes are all about making that method (and anything it calls, directly or indirectly) thread-safe. Here I have currently identified three challenges:

mvglasow commented 5 years ago

Thread-safety for mapsets turned out to be pretty easy, and I managed to implement this in a transparent manner. Obtaining a mapset handle acquires a read lock, closing the mapset handle releases it. Starting a search operation, or retrieving a map from the set, acquires a read lock while the procedure iterates over the maps and releases it when done. Adding or removing a map from the set acquires a write lock and releases it when done. The same might work for the items in a map rectangle.

However…

After running everything through CI, the bitter truth is that G_THREADS_ENABLED does not mean we have full GLib thread support. It seems to work on Linux, Tomtom and Sailfish, but Android, Win32 and WinCE give errors about thread-related definitions from GLib not being available. Apparently, on these platforms, we are dealing with support/glib/fake.h from the Navit source tree, which claims GLib thread support but implements only a rudimentary subset of the functions therein. We either have to upgrade these to “real“ GLib, or extend fake.c to provide all we need. Thoughts or additional information welcome…

mvglasow commented 5 years ago

For the record, @charlescurley writes:

Re threads: Debian 9.8 stretch has glib threads, native. They have deprecated one set of thread calls as of version 2.34.0, and implemented a new set. I just sent a patch to foxtrotgps with the relevant changes. Lots of #if GLIB_CHECK_VERSION(2,34,0) #else #ifend

and to my question about documentation of the changes:

Try these two pages: https://developer.gnome.org/glib/2.34/glib-Deprecated-Thread-APIs.html and https://developer.gnome.org/glib/stable/glib-Version-Information.html#GLIB-CHECK-VERSION:CAPS A lot of what I did now looks like so:

#if GLIB_CHECK_VERSION(2,34,0)
    if (!g_thread_new("friends thread", &update_position_thread, (gpointer) NULL) != 0)
#else
    if (!g_thread_create(&update_position_thread, NULL, FALSE, NULL) != 0)
#endif
        g_warning("### can't create friend thread\n");

Indeed, the Tomtom builds seem to use an old GLib version, Linux and Sailfish seem to have a reasonably modern one.

lains commented 5 years ago

@lains If sharing personal CircleCI artifacts works, you can use a prebuilt binary from https://circleci.com/gh/mvglasow/navit/1327.

I did not find a way to get the output package (zip?) out of CircleCI. I think I will build directly on my machine...

mvglasow commented 5 years ago

@lains should be on the Artifacts tab, here’s the link to the zip: https://1327-43569556-gh.circle-artifacts.com/0/root/project/wince/output/navit.zip

mvglasow commented 5 years ago

Looking at the history of navit/navit/support/glib, facts about this library stub appear to be:

Looking at the current GLib implementation, the thread functions are essentially just wrappers around the native ones (either POSIX threads or Win32). They are intended to be low-level, with no further GLib dependencies, although other GLib components depend on them.

I see the following alternatives:

Either way, we will have to build a wrapper to handle pre and post 2.34.0 APIs.

mvglasow commented 5 years ago

Building against a current version comes with the additional challenge that the actual library needs to come from somewhere. Either it has to be available on the platform, or we have to build it.

mvglasow commented 5 years ago

The GLib files in the support version can be divided into three categories:

In the latter case, parts of the code are commented out with #if NOT_NEEDED_FOR_NAVIT. There are also some formatting changes. Some have logical changes, but these are rare.

mvglasow commented 5 years ago

Did I really write:

the thread functions are essentially just wrappers around the native ones (either POSIX threads or Win32). They are intended to be low-level, with no further GLib dependencies, although other GLib components depend on them.

Must be the result of a heavy redesign. In 2.18.1 they look more like an exercise in maximizing the number of dependencies on other source files. The more, the merrier :-(

lains commented 5 years ago

@mvglasow, I did test on Windows CE (this was done on a Mio C210). This device is running Windows CE 4.2. I have the following log repeating over and over: error:navit:traffic_process_messages_int:!!! We support GLib threads. This could be so much faster!

mvglasow commented 5 years ago

@lains thanks, so we seem to have some level of GLib support on WinCE. Unfortunately, by now I know that WinCE runs on the stripped-down (and ancient) support version of GLib, which has no true thread support but does have some stubs for thread synchronitation.

Pulling in thread support from GLib 2.18.1 seems to be more complex than I thought initially, for reasons described above. Current GLib seems to have minimized dependency of threads on other GLib functions, though, and the implementations themselves look quite straightforward. That leaves us with two options:

mvglasow commented 5 years ago

The FrankenGLib approach does not work well either, as there are still too many dependencies. At the most, we could try updating the entire support/glib tree to something recent, but we would still have to figure out what is needed by Navit, and get everything to build with CMake.

My preference is therefore the fake.c approach taken one step further: provide a Navit thread API as a thin wrapper around the thread library of choice for the platform (which could be POSIX, Win32, GLib or something else). We would only implement the functionality we are actually using, and define a preprocessor constant telling us if threads are supported on this platform. Many operations (such as acquiring or releasing a lock) would then be an unconditional function call in the code, and the API implementation would translate it into the appropriate function call (or into a no-op where multithreading is not supported), and only things such as thread creation or code working around the lack of multithreading would need conditionals. This would also be somewhat extensible.

That means we need to rely on POSIX threads wherever GLib does. The aforementioned USE_POSIX_THREADS is defined in fake.h and thus not defined when we have full GLib support. However, fake.h simply defines it whenever we’re not running on a Windows API, implying we should have POSIX threads anywhere but on Windows.

mvglasow commented 5 years ago

We’re making progress: we have a basic thread library that can spawn and join threads and has support for read/write locks (the only synchronization mechanism I am using thus far).

Results are encouraging (=starts up without crashing) on Linux and Android. Win32 is still having issues with a Windows API function not being found, and WinCE is not recognized as Windows. Tomtom is having issues with a non-standard pthread call, Sailfish builds OK.

We should have thread support for all platforms which support POSIX threads, i.e. all non-Windows platforms. Where thread support is not available, Navit will be single-threaded as before. (We might want to consider making that a compile-time option, so we can always build single-threaded versions on any platform.) Adding multithreading support on Windows is just a matter of plugging the appropriate Windows functions into the library, should be rather straightforward.

Now we need to work these platform-specific kinks out and complete synchronization between threads.

hoehnp commented 5 years ago

@mvglasow: Could you provide me the sailfish builds? I could then try if it starts properly on my sailfish phones. :+1:

mvglasow commented 5 years ago

@hoehnp sure, I am still working on some build issues, then I'll provide a link. There’s still plenty of stuff that is not synced properly between threads, thus crashes and other weird behavior are to be expected—but it should at least start up without crashing.

mvglasow commented 5 years ago

All right, build issues are sorted out.

@hoehnp Sailfish artifacts are here: https://circleci.com/gh/mvglasow/navit/1601#artifacts/containers/0 (go to the Artifacts tab and choose the package for your architecture). As mentioned, thread synchronization is far from complete. It should start up OK, but no guarantees beyond that yet.

mvglasow commented 5 years ago

Accessing map data needs to be thread-safe, as this is an operation which happens in all thread candidates: traffic, routing (in the next phase), and in the UI (we still need to draw the map). Currently working on it, the interface contract currently is as follows:

mvglasow commented 5 years ago

csv maps turned out to be challenging, as they support write operations to any map rectangle, and these in turn touch data structures shared by all currently open map rectangles, Also, opening a map rectangle creates an iterator, which is valid until the rectangle is destroyed. I found no other way to solve it than to acquire a write lock on the whole map when a map rectangle is opened, and release it when the map rectangle is destroyed. However, that means no two map rectangles of the same CSV map can be open at the same time; the thread which attempts to open the second one will block until the first one has destroyed its map rectangle.

Bottom line: map rectangles should not be kept open for long periods of time. Currently route.c violates this new requirement hen building the route graph—though that is less critical as it runs on the main thread for now (at the most it would block the traffic thread while the route graph is being built). If we move routing into a separate thread, that logic would need to be refactored anyway.

Further optimization could be done by:

mvglasow commented 5 years ago

Re items, points to consider regarding thread safety include:

In other cases we might need to synchronize item access.

mvglasow commented 5 years ago

Also, some map implementations have no item_priv but tack the map_rect to their items instead. Where this is the case and the map rect has no reference to anything shared (or it is not touched by item methods), there is no need for synchronization.

mvglasow commented 5 years ago

Almost there—map rect and item access should be almost completely thread-safe now, the only thing still missing is items in mg maps. Now we just need to take care of traffic module internals and its interactions with the route.

hoehnp commented 5 years ago

@mvglasow: sorry for the delayed tests. Starting seems to work nice on the latest Sailfish release. Are there more specific things you would like me to test?

mvglasow commented 5 years ago

@hoehnp no big deal, good to hear at least the basics work. Currently I am still redesigning some things for multithreading (triggering a map redraw and a route update when traffic distortions have changed, ensuring we dump the message store to XML and purge expired messages periodically). After that I will fire a bunch of test cases at the resulting build and make sure we are not running into any deadlocks in normal usage. At that stage some real-world tests would be helpful. However, in order for those tests to be meaningful, you would need a traffic plugin that delivers traffic reports—else the traffic thread will just idle about and never encounter situations which might trigger deadlocks.

Btw, I have modified thread.c to terminate the Navit process immediately if a deadlock is detected. It is currently modeled after GLib, meaning the caller has no way to detect a deadlock. Even if we could detect deadlocks, recovering from them might get fairly complex.

lains commented 5 years ago

Hello @mvglasow, I also noticed that the search process is quite heavily using the mainloop for background lookup of countries, towns, streets etc. in the internal GUI. This could also be a good candidate to multithreading.

mvglasow commented 5 years ago

Agree. Just need to get multithreading to work correctly in the first place (bolting it on after the fact is non-trivial, and it's easy to introduce hard-to-debug errors :-) -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.

bignaux commented 5 years ago

I think event-driven programming (asynchronous) is a better pattern vs threading in our case, i would not recommend to multiply threads.

mvglasow commented 5 years ago

@bignaux if I understand that correctly, that is what we are doing already (certain events, such as UI input or position updates, trigger an operation). The challenge is to keep the UI responsive while we are still processing the previous event, especially if that involves a long operation. In some cases, such as routing, this has been solved with idle loops.

This works well for operations that are largely iterative, i.e. the same code runs a few thousand times, on different data each time, and to an extent also for a handful of steps, each of which is essentially an iteration—as long as the iteration steps are relatively inexpensive. When things get more complex, such as iterating over iterations, chopping this up into quick operations that can be fed into a loop quickly gets complex. Once you have reached that level, you’re basically reinventing multithreading, as you need to keep track of the states of each operation. And very soon you need to govern concurrent access to shared data structures.

At the same time, it will still be cooperative (rather than preemptive) multitasking, i.e. a misbehaving operation can freeze up the UI. Another drawback is that you’ll always be constrained to one processor core: if you have a computation-intensive operation going on, one CPU core will be at 100% and the UI will be sluggish (or completely irresponsive), at the same time the other CPU cores are bored to death. That wasn’t a big issue in the early days of Navit 15 years ago, but it has become more of an issue since. With quad-core CPUs being pretty standard nowadays, single-threaded processes are effectively throttled to 25% of available CPU performance.

PS: I’ve done some work on multithreading already, which you can find at https://github.com/mvglasow/navit/tree/multithreading

bignaux commented 5 years ago

Don't forget we support embedded stuff with limited CPU / memory where asynchronous and non-blocking is more effective than multiply thread to avoid the complexity of make the event loop responsive.

mvglasow commented 5 years ago

You’re right, I tend to overlook that :-) and I don’t have much experience with these platforms, so I can’t tell how much of a difference it would make (once we have a stable multithreaded version, maybe someone can run some tests). However, even if multithreading turns out to be too costly on resource-constrained platforms, there is an easy solution for that.

Multithreading depends on platform-specific libraries, therefore I have written an abstraction layer around these. Currently I have implemented support only for the Posix thread library—most prominently, Windows support is still absent, although planned for the future.

As a result, multithreading is entirely optional and enabled at build time if supported on the target platform. One could quite easily expose a build parameter that would always force a single-threaded build.

lains commented 5 years ago

For platforms with limited CPU, I agree. However, for multi-core platforms, threading would allow for better load distribution accross CPUs and thus would improve overall performance, wouldn't it?

mvglasow commented 5 years ago

@lains I hope it will :-)

The one tricky thing is map access. The UI (and thus the main thread) needs to access the map, and so does the traffic thread. The next candidates for separate threads are routing and possibly search, and each of those needs map access as well. So for the moment, each thread will access the map at some point.

Some map drivers have global data structures (shared by all map rects) to which they make modifications even when data is only read from the map—for example, when data is read from disk and loaded into a memory cache, this is a write operation to the cache. Such maps can only be accessed by one thread at a time, while others have to wait for their turn. Unfortunately, the binfile driver is among those maps, which means most of our use cases will generally have this issue. As a consequence, threads should not keep maps open for any longer than strictly necessary, else they will block each other.

We’ll see how that goes (I have to sort out my locks first); I might also need to refine lock logic for maps which are vulnerable to this.

jkoan commented 4 years ago

Some map drivers have global data structures (shared by all map rects) to which they make modifications even when data is only read from the map—for example, when data is read from disk and loaded into a memory cache, this is a write operation to the cache. Such maps can only be accessed by one thread at a time, while others have to wait for their turn. Unfortunately, the binfile driver is among those maps, which means most of our use cases will generally have this issue. As a consequence, threads should not keep maps open for any longer than strictly necessary, else they will block each other.

Wouldn't it be easier to fix those map plugins to be thread-save?

mvglasow commented 4 years ago

Wouldn't it be easier to fix those map plugins to be thread-save?

Depends on how familiar you are with the map drivers :-) what seems like a daunting task to someone who has never touched that particular part of the code may be easy to someone who knows that same code inside out. As for myself, I haven’t dug very deeply into the map driver code (there are some map drivers I had barely heard about before).

If someone else (possibly with better knowledge of the existing map drivers) wants to help out here, that would be highly appreciated. The code is out there, in my own fork of the repo, but I could easily push it to the main one if that is desired—just let me know.

However, due to personal time constraints I have not done anything on the multithreading branch in over a year, and might not get around to doing much in the foreseeable future. The branch was forked from master in early 2019 and I don’t remember ever merging anything since, so there might be some bit rot. In particular, #815 led to some internal restructuring in the traffic module, quite likely introducing some incompatibilities.

Nonetheless, some groundwork has been done, which someone else might build upon (such as introducing general cross-platform thread management and synchronization routines into the code base). At the time I left the code, it was ready to the point at which it would start up. I would then run some basic operations on it, wait for the next race condition to cause a crash or other malfunction, hunt it down and fix it, then rinse and repeat. As I said, if someone wants to take a stab at it, they’re more than welcome to do so and I’ll be happy to help, but I probably won’t be able to spearhead this particular issue for the time being.