streetcomplete / sc-statistics-service

Statistics generation service for StreetComplete
GNU General Public License v3.0
10 stars 4 forks source link

Concurent Requests for same User #4

Open Akasch opened 4 years ago

Akasch commented 4 years ago

At the moment it is possible that the code is concurrently updating the same user in parallel in response to either multiple requests with the same user or by running a request and the cron update at the same time.

I think nothing bad happens from it beside unnecessary requests to the OSM API.

It could be prevented by usage of transactions over the whole process for each user and SELECT FOR UPDATE on the changesets_walker_state entry for the user.

westnordost commented 4 years ago

What do you mean with transaction over the whole process?

Anyway, regarding the cron, the update_incomplete_statistics.php will only touch users that have not been updated for 30 seconds.

Akasch commented 4 years ago

I mean something like the following (very ugly pseudocode, mixing SQL and php:

[...]
"START TRANSACTION"
$status = "SELECT * FROM changesets_walker_state WHERE user_id=$user_id FOR UPDATE SKIP LOCKED"; // The skip locked bit seems to be relatively new in mysql, sice 8.0
$ok = True;
if (len($status) > 0 && $status['last_update']  < now() -30 && ...) {
    $ok = $changesets_walker->analyzeUser($status, $user_id);
}
if $ok {
    "COMMIT"
} else {
    "ROLLBACK"
    return error();
}
return get_user_statistics();

In this scenario the Database enforces that only one process can update a user at every given time. As mysql cant handle nested transactions the existing transactions would have to be rewritten using checkpoints with "SAVEPOINT name" and "ROLLBACK TO name"

The problem I meant with the update_incomplete_statistics.php is that while it is running the user could send a new request (or multiple of them) and it would be processed in parallel doing unnecessary work.

Akasch commented 4 years ago

The code above never analse users not in the database as it is the same as a skip over an user in processing, so that have to be checked first. An alternative would be to just let the transaction wait on the lock to be released (remove SKIP LOCKED) but this would lead to many php processes waiting for the analyse step to be finished in the worst case, and if the OSM API is very slow to respond it could take some time and wherefore many waiting processes potentially.

westnordost commented 4 years ago

// The skip locked bit seems to be relatively new in mysql, sice 8.0

Hmm I only have an older version available.

The problem I meant with the update_incomplete_statistics.php is that while it is running the user could send a new request (or multiple of them) and it would be processed in parallel doing unnecessary work.

Ok maybe it is possible to set a user-specific mutex/synchronized/semaphore in PHP? I did a short search and found so many completely different approaches (some only work on Linux, some are deprecated/removed, some are dependent on other libraries) that it overwhelmed me.

mnalis commented 2 years ago

I don't see why SKIP LOCKED would be needed (or even correct) here @Akasch, shouldn't FOR UPDATE be enough? e.g. https://stackoverflow.com/a/27868682/2600099

(Of course one has to be having InnoDB tables, and not myISAM)

Locking at SQL server level is most centralizes, as your data itself is there. There are of course many other ways if you insist on client-side locking. Most portable is flock, e.g. https://stackoverflow.com/a/55399220/2600099 but that assume single local filesystem access (so no NFS, SMBFS etc)

There are other, more performant ways, like the use local IPC methods like semaphores, but that also assume running GNU/Linux (or some other POSIX machine) and have their own caveats).

TL;DR: unless you plan to change the database, I'd stick with mysql innodb transactions + select for update