Closed janbaykara closed 2 years ago
This incident only applies to un-localised URLs.
Pushed commit 8dc93e61acfc08aa34b31ce94609d6994e35e662 to see if the .get('page')
method, without a fallback default, was the cause of this.
Pushed 162fdb60068a6c9877eba2af5d0b633e29273f6e to cut down on the "can't find favicon.ico" error logging.
Also pushed 26b99dac7ac1e92fffb05441a1739aa192decc9a which fixes a typo and 44dc68ada26e90b3c32cfa1e9d125ee64d1083ec to guard .get('request')
calls.
@janbaykara
Now the log is less noisy, any clue to why this is occuring?
Happy to look with a fresh set of eyes if you would like...
I suspect that actually, there is probably something that can be done in Wagtail admin to fix this. Judging by the error log, the site doesn't now what page is home?
Fix for this might be blocked by #188
Chucking in @chrisdevereux for a fresh pair of eyes on this, as discussed in sprint planning this morning.
Switching back to me as an even fresher pair is required!
I think what is going on here is that the homepage of the site doesn't know what page it is because production has had a bunch of migrations run on it since the outset, but our local environments, where we are not seeing this error, don't have this problem.
So ignoring the code for now, let's look at the configuration on production and see.
Page with ID 3 is the English homepage...what is the site as a whole set to?
Re-setting this didn't have any effect...trying something else.
Looking at the logs locally, this is what we get from Django.
As expected, the root redirects immediately to the English translation.
[28/Feb/2022 14:56:27] "GET / HTTP/1.1" 302 0
[28/Feb/2022 14:56:27] "GET /en-gb/ HTTP/1.1" 200 16455
This is what I am doing locally.
> http GET localhost:8000
HTTP/1.1 302 Found
Content-Length: 0
Content-Type: text/html; charset=utf-8
Date: Mon, 28 Feb 2022 15:07:33 GMT
Location: /en/
Referrer-Policy: same-origin
Server: WSGIServer/0.2 CPython/3.7.11
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
And with the live site:
> https GET atlas.smartforests.net/
HTTP/1.1 500 Internal Server Error
CF-Cache-Status: MISS
CF-RAY: 6e4a8f7ecee08898-LHR
Connection: keep-alive
Content-Type: text/html
Date: Mon, 28 Feb 2022 15:07:05 GMT
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Server: cloudflare
Transfer-Encoding: chunked
Vary: Accept-Encoding
cache-control: private
<html>
<head>
<title>Internal Server Error</title>
</head>
<body>
<h1><p>Internal Server Error</p></h1>
</body>
</html>
Compare to:
> https GET atlas.smartforests.net/en/
HTTP/1.1 200 OK
CF-Cache-Status: MISS
CF-RAY: 6e4a9301bd2406f9-LHR
Connection: keep-alive
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Mon, 28 Feb 2022 15:09:29 GMT
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Last-Modified: Mon, 28 Feb 2022 15:09:29 GMT
Server: cloudflare
Transfer-Encoding: chunked
cache-control: private
content-language: en
referrer-policy: same-origin
vary: Cookie, Accept-Encoding
x-content-type-options: nosniff
x-frame-options: DENY
So, as is relatively obvious at this juncture, our 302 is not being produced at a higher level in the stack.
As we are having problems with deploys, also suspect the database migrations are not internally consistently.
Currently trying to get Digital Ocean to let me in...
Is this related? https://github.com/wagtail/wagtail/issues/4934
They seem to be blaming the hosting environment here.
Perhaps this is some sort of configuration drift here.
https://github.com/wagtail/wagtail/issues/3751#issuecomment-322190832
This is what seems to be suggested above. The problem is some unmigrated thing.
Is it this middleware causing the issue?
[server] [2022-02-28 17:40:01] Traceback (most recent call last):
[server] [2022-02-28 17:40:01] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
[server] [2022-02-28 17:40:01] response = get_response(request)
[server] [2022-02-28 17:40:01] File "/app/logbooks/middleware/pages.py", line 11, in __call__
[server] [2022-02-28 17:40:01] return self.get_response(request)
This seems to be potentially where it crosses the boundary into code under our control. This strikes me as code which potentially effects a lot. But if request is bad all the way down, this is only a reaction to the problem, not the problem itself.
This is where the code was introduced.
https://github.com/planetarypraxis/smartforests/pull/119/files
If we were to turn this off, what happens?
Disabled this middleware for the moment. Deploying to see the effect.
https://github.com/planetarypraxis/smartforests/commit/216c111546968e15b484a516e2d94650cabdaedd
Remembering we can't see this problem at all locally.
No dice and naturally means we have regressed the menus this underpins.
Restored the middleware for the moment.
https://github.com/planetarypraxis/smartforests/commit/90c431bc6ddfc278a46ba7ccb5ef20e121c72934
The initial hypothesis that this was to do with https://github.com/planetarypraxis/smartforests/blob/main/commonknowledge/bootstrap/templatetags/ckbootstrap_tags.py and therefore adding some guarding around it was a solid shout. However, this doesn't seem to be the source.
Returning to the exception itself and making two observations. It is saying key error in page and then in request, which is why the above change was made. Somewhere in the stack, Django/Wagtail can't locate this.
This is why above I wondered if something strange had happened in the database. Meaning that Django/Wagtail can't find the page the homepage is, considering the redirection to the en-gb
version of the site and is troubled by this.
I'm tempted here to roll the whole site back to the last known good version, though in truth that isn't going to help us, apart from to zero in on the commit where the bug may have been introduced. A classic bisect to locate a bug.
It might be that this version is last known good https://github.com/planetarypraxis/smartforests/commit/556f26e6e40f9582c8a2b5d9889877195e9f8566. I can't see anything between then and now that is especially problematic and effects something like initial page load.
Between this and the fact this does work locally points to me to something in the data.
Making this as an observation in passing.
I'm going to set the site up locally from production data today. If nothing else this might give me a reproduction and allow me to have a faster feedback cycle in locating this bug. This should be relatively easy using wagtail-transfer.
Since wagtail-transfer
needs a URL at the root (/wagtail-transfer
by default) and all non-prefixed URLs appear to be failing on production, the reproduction of production data locally will need to proceed by another form of export.
This issue is the underlying issue causing problems with search. Search is a URL at the root, so is having the same issue.
In the browser console:
And also at https://atlas.smartforests.net/search/ - which should work.
Guess: there is some stale cache somewhere that is causing this problem. It is trying to get something from a cache that no longer exists.
So emptying all caches after this current deployment.
pp@server-78f774fd97-4kgq7:/app$ pipenv shell
Launching subshell in virtual environment...
app@server-78f774fd97-4kgq7:/app$ . /home/app/.local/share/virtualenvs/app-4PlAip0Q/bin/activate
(app) app@server-78f774fd97-4kgq7:/app$ python manage.py shell
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
Python 3.7.3 (default, Jan 22 2021, 20:04:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from django.core.cache import cache
>>> cache.clear()
Didn't work unfortunately!
Checking in on this.
Someone else has this error, from a few years ago: https://github.com/jazzband/wagtailmenus/issues/294#issuecomment-453641853
Noting that wagtailmenus has stopped being maintained. https://github.com/jazzband/wagtailmenus/issues/403#issuecomment-988757076
Or perhaps it is? https://github.com/jazzband/wagtailmenus/issues/409
Seems there is a 3.1 version coming out soon: https://github.com/jazzband/wagtailmenus/compare/v3.0..v3.1
In it, there seems to be a change that handles missing request
values. My hypothesis is that this might affect the operation of the wagtailmenus context provider, which seems to run even for the search endpoint, based on the logs just above.
Good news, I broke the build again! https://cloud.digitalocean.com/apps/f83fe91a-0167-4b68-a789-b8579816239a/deployments/df783228-f5a0-4664-be23-1441a0bb89f4?i=8bfd6e
Nice work @janbaykara!
This seems like a fruitful avenue for working this out. Do we want to chance an upgrade to this edge version and see if it fixes our problem? This explains why it doesn't happen locally. I for one don't have this set up.
Good news, I broke the build again! https://cloud.digitalocean.com/apps/f83fe91a-0167-4b68-a789-b8579816239a/deployments/df783228-f5a0-4664-be23-1441a0bb89f4?i=8bfd6e
Seems to me this is something to do with something changing in wagtail_localize
between deploys.
That's exactly what I did. Have pushed 4cdaf7709c9e58c0255ee58ec63a54593a7f59cf and f546927431dba4a5142152635bf46483929289a4 to see if these blow things up.
The error is something I've encountered locally just now:
[2022-03-02 18:02:25] ModuleNotFoundError: No module named 'wagtail_localize.locales'
Now looking at the changes more closely, it looks like it has also auto-bumped Wagtail to 2.16.1, but there are no breaking changes so I don't think that's the issue.
I'll try and rejig the Pipenv a bit and get this working locally.
Cracking stuff mate!
Noticing that wagtail-localize was downgraded for some reason
This is probably because 1.0.1 did not support Wagtail 2.16.1. Bumping it to 1.1rc1, which will probably be safe enough to try... ?
Hmm, looking at the 1.1 milestone, there are some blocking bugs which I don't feel comfortable adding in.
So going to roll back to Wagtail 2.15.X
Deploying commit ddd4ac415566b2e969354c6c771a2f0250c26f5d
Final crack at this which adds missing deps 844b40c2d10ae8d52918d75c31efa39d1fc9dac0
Sadly, this issue is still live.
Thanks for the efforts on this.
Great work, but doesn't seem to have identified the problem. The above log seems basically the same as when we began looking.
Let's try and break the traceback down and do "what I am seeing here".
[server] [2022-03-02 19:58:21] [2022-03-02 19:58:21 +0000] [11] [ERROR] Error handling request /
[server] [2022-03-02 19:58:21] Traceback (most recent call last):
[server] [2022-03-02 19:58:21] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner
[server] [2022-03-02 19:58:21] response = get_response(request)
[server] [2022-03-02 19:58:21] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/core/handlers/base.py", line 167, in _get_response
[server] [2022-03-02 19:58:21] callback, callback_args, callback_kwargs = self.resolve_request(request)
[server] [2022-03-02 19:58:21] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/core/handlers/base.py", line 290, in resolve_request
[server] [2022-03-02 19:58:21] resolver_match = resolver.resolve(request.path_info)
[server] [2022-03-02 19:58:21] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/urls/resolvers.py", line 589, in resolve
[server] [2022-03-02 19:58:21] raise Resolver404({'tried': tried, 'path': new_path})
This seems to be the router doing its thing. You pass request
into resolver_match = resolver.resolve(request.path_info)
. Presumably this resolve function does a thing to map the path (here /
, but some other stuff too - its a dictionary not a string) to a match to what Django should be dishing up.
Then we get a bunch of stuff that isn't finding the key page
in a dictionary.
[server] [2022-03-02 19:58:21] During handling of the above exception, another exception occurred:
[server] [2022-03-02 19:58:21]
[server] [2022-03-02 19:58:21] Traceback (most recent call last):
[server] [2022-03-02 19:58:21] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/template/base.py", line 829, in _resolve_lookup
[server] [2022-03-02 19:58:21] current = current[bit]
[server] [2022-03-02 19:58:21] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/template/context.py", line 83, in __getitem__
[server] [2022-03-02 19:58:21] raise KeyError(key)
[server] [2022-03-02 19:58:21] KeyError: 'page'
[server] [2022-03-02 19:58:21]
[server] [2022-03-02 19:58:21] During handling of the above exception, another exception occurred:
[server] [2022-03-02 19:58:21]
[server] [2022-03-02 19:58:21] Traceback (most recent call last):
[server] [2022-03-02 19:58:21] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/template/base.py", line 835, in _resolve_lookup
[server] [2022-03-02 19:58:21] if isinstance(current, BaseContext) and getattr(type(current), bit):
[server] [2022-03-02 19:58:21] AttributeError: type object 'Context' has no attribute 'page'
[server] [2022-03-02 19:58:21]
[server] [2022-03-02 19:58:21] During handling of the above exception, another exception occurred:
[server] [2022-03-02 19:58:21]
[server] [2022-03-02 19:58:21] Traceback (most recent call last):
[server] [2022-03-02 19:58:21] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/template/base.py", line 843, in _resolve_lookup
[server] [2022-03-02 19:58:21] current = current[int(bit)]
[server] [2022-03-02 19:58:21] ValueError: invalid literal for int() with base 10: 'page'
[server] [2022-03-02 19:58:21]
Finally we get a large cascade of errors that are the same one, looking up request
in some dictionary.
[server] [2022-03-02 19:58:21] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/src/wagtailmenus/wagtailmenus/models/menus.py", line 121, in _get_render_prepared_object
[server] [2022-03-02 19:58:21] instance.prepare_to_render(context['request'], ctx_vals, opt_vals)
[server] [2022-03-02 19:58:21] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/template/context.py", line 83, in __getitem__
[server] [2022-03-02 19:58:21] raise KeyError(key)
[server] [2022-03-02 19:58:21] KeyError: 'request'
The first non-Django code this seems to hit is wagtailmenus
which is why what you tried Jan is completely sensible.
As far as I can see, without going line by line (which is maybe needed at this point), this is the only third party code in the mix. None of our code seems to be involved at all.
I'd suggest three moves here, in order of difficulty.
wagtailmenus
entirely on production and see if we are good. The problem is then 100% in there. It's useful to be able to set these menus, but not at this level of oddness. This may cause problems in other areas but is probably isolated enough to ditch it.I'm thinking about why pages load fine when they are i18n-ified, and wonder whether there is something about the locale auto-redirection which is broken (e.g. ❌ https://atlas.smartforests.net/map/ vs ✅ https://atlas.smartforests.net/en/map/ ).
Some API endpoint load fine, despite not being i18n-ified (e.g. ✅ https://atlas.smartforests.net/api/v2/geo/?language_code=en?bbox=[-68.5393,-48.7759,66.1092,73.0875]&zoom=2 or ✅ https://atlas.smartforests.net/api/docs/swagger/ ). What is common to these is that they are wrapped in a 'bare' list()
and they all work:
By comparison, other endpoints do not work without a locale in their URL (e.g. ❌ https://atlas.smartforests.net/_metadata/324/ vs ✅ https://atlas.smartforests.net/en/_metadata/324/ ). What unites these URLs is that they are wrapped in i18n_patterns
:
Now I'm double checking whether there is something wrong with our i18n setup, by reading through the install docs for wagtail-localize.
Didn't find anything, but still wondering...
I tried deleting the specific menu items from main and footer, nothing changed.
Possible fix: https://stackoverflow.com/a/65065608/1053937
Suggestion: I'd suggest investigating exactly which template tags cause the 404.html template to fail. My guess is that when you're pulling in a base template with {% extends %}, your base template is assuming the presence of variables such as page which aren't available when producing a 404 response. In Wagtail, redirects are handled by a middleware which is activated when the normal page serving process returns a 404. If an error happens in the process of generating that 404 response, then the middleware won't identify it as a 404, and so redirects won't happen.
Reply: Thanks again! What was causing the error was this exact segment of base.html. I wrapped it around with {% if page %} {% endif %} and everything is working now. I think this should be in the docs, because this was quite confusing! Cheers.
I suppose this might be applicable to base.html
where we do this a bit:
Bit puzzling though because I think we've been using this for a while without a hitch. But it is possible we just didn't notice, I guess?
What might be new is using page
specifically, in header.html which did change recently in #159, and we know from the error logs that quite early on, the page can't be found, and the 404 error for the locale-less page template is hanging up before a redirect can be affected, as is mentioned in stackoverflow:
Seems like it might coincide with the bug appearing. So I will try and implement the recommendation from stackoverflow on this.
Pushed commit 173a084fa5c8495760695c5c5370a221e8358a0c
(It'd be really nice if these posts were automated - #101)
Well, something happened. I made it worse! Now even local-ised URLs like /fr/map
don't work.
I see, I typo'd the template. My bad, fixing.
It's bloody fixed, isn't it!
Error logs handling `/`
``` [server] [2022-02-09 08:27:12] [2022-02-09 08:27:12 +0000] [11] [ERROR] Error handling request / [server] [2022-02-09 08:27:12] Traceback (most recent call last): [server] [2022-02-09 08:27:12] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/core/handlers/exception.py", line 47, in inner [server] [2022-02-09 08:27:12] response = get_response(request) [server] [2022-02-09 08:27:12] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/core/handlers/base.py", line 167, in _get_response [server] [2022-02-09 08:27:12] callback, callback_args, callback_kwargs = self.resolve_request(request) [server] [2022-02-09 08:27:12] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/core/handlers/base.py", line 290, in resolve_request [server] [2022-02-09 08:27:12] resolver_match = resolver.resolve(request.path_info) [server] [2022-02-09 08:27:12] File "/home/app/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/django/urls/resolvers.py", line 589, in resolve [server] [2022-02-09 08:27:12] raise Resolver404({'tried': tried, 'path': new_path}) [server] [2022-02-09 08:27:12] django.urls.exceptions.Resolver404: {'tried': [[