internetarchive / fatcat-scholar

search interface for scholarly works
https://scholar.archive.org
Other
78 stars 14 forks source link

search results page occasionally dumps escaped HTML for part of the page #89

Open iisa opened 2 years ago

iisa commented 2 years ago

Repro steps:

Image of side-by-side results page w/ string HTML & its element via inspector: image

side-by-side inspector/view diff of sidebar w/ html string: image

bnewbold commented 2 years ago

I just encountered this bug myself via organic use.

My browser environment: Chromium on Debian Linux (x86_64), not incognito mode, language preference is English. The query I first encountered this on was https://scholar.archive.org/search?q=%22west+roxbury%22. I refreshed in a new tab once and got the same thing (escaped HTML). Refreshed again a minute later and got non-escaped text, so this seems to be an intermittent issue.

I did a view source and see similar things to what Isa posted above. I tried a wget to capture the raw HTML, and also did "save HTML", and in both cases the text was not escaped (aka, looked correct). Also when I go in devtools -> Sources and inspect the source code the HTML looks correct. In all these cases it may be that reloading the page means I am seeing the updated ("correct") HTML? This is confusing/confounding.

Three debugging thoughts so far:

To synthesize the later two, perhaps I did a git checkout on the server, but didn't update the FastAPI process, so some processes/threads have out of sync template or translation files?

The uptime of the main fatcat-scholar systemd service is: Active: active (running) since Sat 2021-12-11 17:46:44 UTC; 3 weeks 3 days ago

The currently deployed git commit is:

commit 6038bea6ad071db02110fdaeeb51b38fc19dbd54 (HEAD -> master, origin/master, origin/HEAD)
Author: Bryan Newbold <bnewbold@archive.org>
Date:   Fri Dec 10 14:26:25 2021 -0800

    semantic-ui css: fix typo in italic font path (causing 404 errors)

That doesn't seem to be it. I'm going to restart the process, and possibly the entire VM (failover to replica for a couple minutes) and try to reproduce.

bnewbold commented 2 years ago

I redeployed and no change. As of today this problem seems to be happening even more frequently. I was able to reproduce with wget, so problem is definitely server-side, not in browser processing of HTML.

bnewbold commented 2 years ago

After deploying some small translation updates this morning, I am no longer able to reproduce the problem. I suspect that the issue is related to our translation (i18n) infrastructure and jinja2 HTML template integration. We do some special hack to make this work with our async web framework (FastAPI) on a per-request basis, incorporating path prefixes and Accept-Language browser headers. I suspect that there might be some obscure threading/concurrency corner case happening which is breaking the jinja2 {% trans %} blocks, resulting in them not being returned as "safe" strings. An alternate hypothesis is a related issue with jinja2 macros.

Without being able to reliably reproduce this right now (in any of dev, qa, or prod environments), and the issue being intermittent even when it can be reproduced, I think unfortunately this is going to have to sit on the back-burner until it crops up again. :disappointed:

bnewbold commented 2 years ago

Oh no, it is back!

https://scholar.archive.org/search?q=Sidewalk+Measurements+from+Satellite+Images+Preliminary+Findings

Screenshot_2022-02-17_09-43-50

miku commented 2 years ago

The above example reproduces quite reliable (seeing raw html in about 50% of the cases while reloading in firefox 97.0, linux).

bnewbold commented 2 years ago

This issue has come and gone over the past couple months. Will prioritize getting a reproducible example of this bug and getting it fixed.

bnewbold commented 2 years ago

I pushed some changes to production just now which I hope have resolved this issue. There is a small impact on latency, which is unfortunate.


A few notes on the issue:

This debugging process took multiple entire days. I'm pretty inclined to ditch fastapi and refactor the web service to use Flask instead.