Closed SuzanneSoy closed 7 years ago
This problem can occur at least in the situation when the package-change-handler
thread is busy (perhaps rerendering a large number of packages after an update). The package-deletion page is trying to ensure some kind of "read-your-writes" consistency by waiting for the deletion to definitely have propagated through the system before redirecting the user back to the main page. In the case I'm seeing, I think this leads to a long delay because the package-change-handler
thread is too busy with other things. I will see if something similar happens when the package-change-handler
thread is idle; if so, then my hypothesis is wrong and I will have to think harder about this.
If the hypothesis is correct, one mitigation might be for the package-change-handler
to prioritise incoming small requests over long-running background refreshes somehow. I'm not sure if that'd work well. The design of the program is complicated in that little area.
Nope, it happens even when package-change-handler
is otherwise idle. But it's related to what I thought. Here's a stack trace from deleting a package:
2016-12-21 01:28:29.856860500 *** DAEMON CRASHED: static-renderer ***
2016-12-21 01:28:29.856861500 file-size: file not found
2016-12-21 01:28:29.856861500 path: /var/lib/pkgserver/public_html/pkg-index-static/pkg/digicore
2016-12-21 01:28:29.856862500 context...:
2016-12-21 01:28:29.856862500 /home/pkgserver/racket/collects/racket/file.rkt:723:0: file->bytes119
2016-12-21 01:28:29.856862500 /home/pkgserver/racket-pkg-website/src/static.rkt:213:2: for-loop
2016-12-21 01:28:29.856863500 /home/pkgserver/racket/collects/racket/private/more-scheme.rkt:261:28
2016-12-21 01:28:29.857183500 /home/pkgserver/racket-pkg-website/src/static.rkt:234:0: static-renderer-aws-s3
2016-12-21 01:28:29.857184500 /home/pkgserver/racket-pkg-website/src/static.rkt:120:0: static-renderer-main
2016-12-21 01:28:29.857185500 /home/pkgserver/racket/collects/racket/private/more-scheme.rkt:261:28
2016-12-21 01:28:29.857437500
2016-12-21 01:28:29.857667500 *** DAEMON CRASHED: package-change-handler ***
2016-12-21 01:28:29.857667500 RPC exception:
2016-12-21 01:28:29.857667500 file-size: file not found
2016-12-21 01:28:29.857668500 path: /var/lib/pkgserver/public_html/pkg-index-static/pkg/digicore
2016-12-21 01:28:29.857668500 context...:
2016-12-21 01:28:29.857669500 /home/pkgserver/racket/collects/racket/file.rkt:723:0: file->bytes119
2016-12-21 01:28:29.857669500 /home/pkgserver/racket-pkg-website/src/static.rkt:213:2: for-loop
2016-12-21 01:28:29.857669500 /home/pkgserver/racket/collects/racket/private/more-scheme.rkt:261:28
2016-12-21 01:28:29.857953500 /home/pkgserver/racket-pkg-website/src/static.rkt:234:0: static-renderer-aws-s3
2016-12-21 01:28:29.857954500 /home/pkgserver/racket-pkg-website/src/static.rkt:120:0: static-renderer-main
2016-12-21 01:28:29.857954500 /home/pkgserver/racket/collects/racket/private/more-scheme.rkt:261:28
2016-12-21 01:28:29.857955500
2016-12-21 01:28:29.857955500 context...:
2016-12-21 01:28:29.857955500 /home/pkgserver/racket-pkg-website/src/rpc.rkt:31:0: rpc-call
2016-12-21 01:28:29.858243500 /home/pkgserver/racket-pkg-website/src/site.rkt:1531:21
2016-12-21 01:28:29.858244500 /home/pkgserver/racket/collects/racket/private/more-scheme.rkt:261:28
2016-12-21 01:28:29.858430500
(Saving a package when the package-change-handler
is busy is also problematic. So my original hypothesis stands, kind of. There are then two problems to fix: one re: trying to read deleted files, and one re: blocking urgent work behind a stack of less-urgent stuff.)
We have two problems:
package-change-handler
can get stuck behind low-priority tasksHere's an analysis of why deleting a package crashes in the UI server.
signal-static!
to, in the background,
static-path
copy of the pkg filesstatic-path
pkg files that no longer have master pkg filesstatic-path
directory is totally up-to-datestatic-path
for file names, and then iterates through
the list uploading to S3There's a TOCTTOU bug in that last step. In addition, the assumption in the penultimate step is invalid.
Potential fixes:
All of these require separate resolution of the TOCTTOU bug, which is independent of the bigger causal stuff.
Fix (1) is no good, because the inter-package interaction refresh might do rather a lot of work.
Fix (2) is no good, because it's a special-casing which works around just this one manifestation of the problem.
Fix (3) is no good, because it's a grody hack that will sometimes fail mysteriously.
Fix (4) is kind of disappointing but basically OK.
Fix (5) would be ideal but it's a long-term idea that I don't have time to pursue.
So I'll go with fix (4), probably.
To address the second, priority-confusion problem, I'll have to refactor package-change-handler
in the UI server. I haven't done the necessary planning yet.
Happily, a form of polling already exists, in that the package database is refetched every few minutes in its entirety. So I just need to fix the TOCTTOU problem to get deletion to stop faulting, I think.
With those commits, I think this is fixed. @jsmaniac, please feel free to reopen or file new bugs if you notice further problems, and thank you for the report!
When deleting a package, the deletion seems immediate, but the page takes forever to answer. I tried letting it finish, and got the following error:
This is not a big problem, as the deletion seems to work successfully nevertheless.