raduprv / Eternal-Lands

http://www.eternal-lands.com
Other
157 stars 57 forks source link

Actors list locking #147

Open gvissers opened 3 years ago

gvissers commented 3 years ago

I had a quick look at the locking of the actors list, triggered by the current thread on eye candy issues. I don't know if these are related, but the actors list is is accessed in multiple places while another thread is holding the mutex.

I created a macro to check if the current thread is holding the lock (almost, there could also be no thread holding the lock at all):

#define CHECK_ACTORS_LIST_LOCK() \
    do \
    { \
        if (SDL_TryLockMutex(actors_lists_mutex) == SDL_MUTEX_TIMEDOUT) \
        { \
            fprintf(stderr, "%s:%d: The actors list mutex is held by another thread!\n", \
                __FILE__, __LINE__); \
        } \
        else \
        { \
            SDL_UnlockMutex(actors_lists_mutex); \
        } \
    } while(0)

Since the SDL locks are recursive, the TryLock call will succeed if the current thread is holding the lock (or the mutex is not locked at all), and fail if another thread is holding it.

I sprinkled these across the source where actors_list is used, took a short walk around Isla Prima, killed some creatures, and talked to Lasud, and got the following:

~> ./el.linux.bin test 2> log
~> grep mutex log | sort | uniq -c
     72 actors.c:1137: The actors list mutex is held by another thread!
    171 actors.c:1298: The actors list mutex is held by another thread!
     67 actors.c:1338: The actors list mutex is held by another thread!
     26 actors.c:1465: The actors list mutex is held by another thread!
     82 actors.c:1749: The actors list mutex is held by another thread!
      7 gamewin.c:1136: The actors list mutex is held by another thread!
      1 select.cpp:135: The actors list mutex is held by another thread!
      7 select.cpp:328: The actors list mutex is held by another thread!

(the line numbers may not match exactly with the current source, because the calls to CHECK_ACTORS_LIST_LOCK() add lines as well).

This is only checking actors_list, not your_actor, MY_HORSE, MY_HORSE_ID, IS_HORSE, ACTOR, ACTOR_WEAPON, or other ways in which the actors list is abused.

As I said, I have no idea if this is related to the eye candy issues, and given the fact that most of the time, things work well enough it's probably not immediatly dangerous, but I really think this needs fixing.

pjbroad commented 3 years ago

I've looked at this before but abandoned the work because it was too big, but I agreed it needs fixing. There are likely places where read access could be allowed in parallel too.

gvissers commented 3 years ago

I've looked at this before but abandoned the work because it was too big

Yes, that pesky list seems to have put its tendrils into everything (which is not altogether surprising). I am working on it, but fixing this is definitely not for the upcoming release.

There are likely places where read access could be allowed in parallel too.

That's what I thought at first. However, although in most places the pointers in the actors list itself remain unchanged, the actor data they point to is written to. Perhaps we could protect the actors list with an rwlock, but we'd have to add a mutex to the individual actors. It's something we can look into, but for now I'm still using a single mutex for all of the actor data. Let's see what the impact of that is once everything is properly locked.

gvissers commented 2 years ago

Short update:

I have the actors list (and the actors within) now properly contained in my actors_list_lock branch. It should no longer be possible to retrieve an actor pointer without locking the list. Since the client is not written in Rust, you can of course still hold on to it after you have released the lock, thus invalidating this work, but at least you will have to work at it :laughing:

Unfortunately the diff is moderately huge again (64 changed files with 6,262 additions and 4,900 deletions at the time of writing), sorry for that. I have not seen a performance regression, a quick analysis shows that actor lookup constitues a negligible part of the time usage. The code by default still uses a recursive mutex, AFAICT due to a single function in the sound code that can be called through too many paths to simply pass down the required actors. I will see if I can remedy this, so that we can switch to a normal mutex.

I will hold off on submitting a PR until after the new release, that will also give me the time to test it properly. With actors being rather central to the game play, basically everything will have to be tested, but if anyone wants to help, focusing on eye candy effects (magic, mostly), multi-combat, and ranging would be a good idea.

gvissers commented 2 years ago

Pull request made, #164.