Open jimchamp opened 5 months ago
@jimchamp recommends P2 since we're hitting sentry errors of a similar form and getting this class of errors removed will give us more clarity. Ty!
Hi, I'm a college student interested in working on open-source projects. Could I be assigned this issue? Thanks.
@KareemDream, you've been assigned, thanks! I've updated the original message in this thread to include the function that needs to be updated. The solution may be as simple as removing the trailing /
character, if one exists, from the path.
Hi there, I am a classmate of @KareemDream who has been working with him to solve this issue!
In short, we have recently found that the underlying bug behind this issue seems to be larger-scoped than initially thought. We say this after some local testing and logging, and finding that the error message shows up before the call function has even been called.
To begin, we reproduce below a screenshot of the bug, from the logger's perspective:
As you can see no print statements are called before the bug occurs. However, we have added a print statement as soon as the call function is entered:
Due to this, we believe that the UnicodeEncodeError that we are getting stems from some upstream issue; however, the stack trace was not very informative and we are currently unable to determine where it comes from.
Because of the end of the spring semester at our university we unfortunately do not anticipate having time to finish out this issue. As a result we have 2 requests: 1) do our concerns seem valid/would it be reasonable to rescope this issue, and 2) would it be possible to reassign this issue altogether?
Thank you so much!
Thanks for looking into this! I'll have to do some more investigation into where this error is occurring.
Problem
Some URLs, when visited, cause the site to serve an unstyled "Internal Server Error" page. This is happening for author, edition, and work URLs that have a trailing
/
character following a readable slug that contains non-latin characters.Evidence / Screenshot
Relevant URL(s)
Working URL: https://openlibrary.org/books/OL4210801M/Das_schwerho%CC%88rige_Kind
Broken URL: https://openlibrary.org/books/OL4210801M/Das_schwerho%CC%88rige_Kind/
Reproducing the bug
Context
Related to this Sentry issue
Notes from this Issue's Lead
Proposal & constraints
We suspect that this error may have something to do with how our ReadableUrlProcessor is encoding the readable slug and redirecting.
A similar issue was solved by creating a
safe_seeother
function, which encodes a URL usingencode_url_path
before passing it toweb.seeother
.Maybe
encode_url_path
can be used by theReadableUrlProcessor
before any redirects?Related files
Code changes likely only needed in this method: https://github.com/internetarchive/openlibrary/blob/bee8ab4e369a500e1e31f73d3f06fb88e7c5896f/openlibrary/core/processors/readableurls.py#L40
Stakeholders
@cdrini