Occasionally when we deploy new code, the routes will be messed up (i.e. the wrong route handler gets called). If I erase the cache file so it gets regenerated, the problem goes away.
My guess at what is happening: when the new code is mv'd into place, which is atomic, two requests come in close together enough that the route cache has started generating, but hasn't finished, so a second route-cache build is initiated and the two builds do interleaving writes which causes a corrupt cache.
I did capture a good chunk of the bad cache file and the corresponding good cache file. What I notice is that in the bad cache file, there are missing entries, i.e. it goes from 5 to 7 skipping 6.
I can post here if it would be useful (I'll need some time to scrub it for any sensitive info if any).
If my hypothesis is correct, I think the simple fix is to add the LOCK_EX flag to this file_put_contents.
On the other hand, if my hypothesis is correct, I'm pretty amazed that there isn't an issue already for others...
Occasionally when we deploy new code, the routes will be messed up (i.e. the wrong route handler gets called). If I erase the cache file so it gets regenerated, the problem goes away.
My guess at what is happening: when the new code is mv'd into place, which is atomic, two requests come in close together enough that the route cache has started generating, but hasn't finished, so a second route-cache build is initiated and the two builds do interleaving writes which causes a corrupt cache.
I did capture a good chunk of the bad cache file and the corresponding good cache file. What I notice is that in the bad cache file, there are missing entries, i.e. it goes from 5 to 7 skipping 6.
I can post here if it would be useful (I'll need some time to scrub it for any sensitive info if any).
If my hypothesis is correct, I think the simple fix is to add the
LOCK_EX
flag to this file_put_contents.On the other hand, if my hypothesis is correct, I'm pretty amazed that there isn't an issue already for others...