Closed icydee closed 8 years ago
A job queue wouldn't seem to help, unless it did all ship arrivals/planet ticks, and even then there would be problems.
Without transactions, you need good locking - something that makes sure nothing else is accessing a body or even empire - something that would be easier to do ahead of time if all deferred actions (ship arrivals, building updates) were in a job queue instead of being done ad-hoc.
I know, if the game was organized around a job queue from the start then it would work, but I can't see how it can be retro-fitted.
I can (and have) made a few minor changes, but I can't see how to avoid the problem entirely.
I did some of this back in the day, but one thing I would highly recommend is that on long running processes you don't instantiate objects until you actually need them. If the objects are only around for a second or two, they're much less likely to be out of date.
Another thing you can do, if you need to keep the objects around longer, you can pull fresh copies of data from the DB before running calculations and then update immediately.
Without transactions, you'll never solve the problem entirely. It was a flaw in my design from the start.
On Sep 6, 2012, at 12:34 PM, Ian Docherty wrote:
I know, if the game was organized around a job queue from the start then it would work, but I can't see how it can be retro-fitted.
I can (and have) made a few minor changes, but I can't see how to avoid the problem entirely.
— Reply to this email directly or view it on GitHub.
I have been considering that too, by using $body->discard_changes to ensure fresh data is obtained from the database but I have been reluctant to try it in case it has other side effects.
One of the big performance improvements was to not instantiate objects every time we need it but to 'cache' them (so we don't for example read the 'body' dozens of time per http request) so I suppose we are now suffering from that change.
I will try a few appropriate 'update' followed by 'discard_changes', I don't think it will cause a problem and it might help.
I wouldn't discard a retrofitting a job queue as impractical; it could be done.
I like TheSchwartz, but a roll your own solution may be better.
I have a code change I would like to try first.
Then a job queue for planet ticks and ship arrivals might be possible which should catch most issues. I may try an experiment.
Yes I looked at TheSchwartz again yesterday.
we have a job queue working now for building upgrade/work-end which seems to be working OK.
I think there may be a memory leak however, it runs for several weeks, then is terminated.
I have just put in a modification so that it just runs for an hour, then exits, and a cron-job restarts a new process every hour. I will try this for a few weeks to see if it behaves.
After modifying tick_planets.pl to not grab every planet object at the start (the way DBIx::Class handles the query can create excess objects before we need them), but only the IDs of the planets we're going to tick, and then loading each planet as we go, I think this may be mostly fixed. Instead of holding all the objects around for 5 minutes or however long the tick takes, each body is only held for as long as it takes to tick that body. This reduces the window of opportunity for this type of an error drastically to the point where it can be nearly ignorable.
A better fix, in my mind, is to override stuff in DBIx::Class so that some fields are not updated by setting directly, but by using SQL math. i.e., instead of SET bean_stored = ?
, use SET bean_stored = bean_stored + ?
and provide the expected delta. This gets complicated for foods/ores more than water/energy only because with water/energy we can cap it easily: SET water_stored = LEAST(water_stored + ?, water_capacity)
, but food and ore will get convoluted here. Not to mention general convolutionism when we try to get DBIx::Class to change how it operates for some fields in the first place.
On numerous occasions, people purchase a trade for resources and they arrive, are visible for a short while, then they disappear.
After much thought I am certain that this is due to the interaction between the back-end scripts (hourly tick planets) and the front end UI code.
For example:-
the final back end update will effectively overwrite the 'body' object negating the update caused by the front end.
This is most likely to happen with people who are closely watching their ship arrivals in their browser, concerned perhaps about previously failed arrivals.
The solution is much more difficult to find.
Transactions might work, but would be a very big change to the structure of the code base.
Making the resource updates 'atomic' might help, but it would be a big change to identify all resource calculations.
I sort of think that doing planet 'tick's in a job queue might work but I don't know how big a change this would entail.
so, more thought is required unless there are other suggestions?