hypothesis / lms

LTI app for integrating with learning management systems
BSD 2-Clause "Simplified" License
46 stars 14 forks source link

Base moodle's file_exist method on the file content instead of headers #6841

Closed marcospri closed 2 weeks ago

marcospri commented 2 weeks ago

In moodle we can't make a API call to check if a file exists and we have to rely on the URL where file will be downloaded.

When a problem occurs we get a 200 response with a JSON document with the reason of the error.

We don't want to always download the document as it will actually download the full PDF file for success cases, the most common case.

Until now we been relying on the headers of a HEAD HTTP request, interpreting JSON responses as the file being missing.

We have found at least one school that doesn't include the content-type header on the response so he are switch the approach to inspect the first bytes of the response and check if we are getting a PDF back from Moodle.

Here's the headers from the file in the Moodle instance that doesn't include content-type:

 {'Date': 'Tue, 05 Nov 2024 14:42:55 GMT', 'Content-Length': '694351', 'Connection': 'keep-alive', 'Expires': 'Tue, 05 Nov 2024 16:43:00 GMT', 'Cache-Control': 'private, max-age=7205', 'Last-Modified': 'Fri, 01 Nov 2024 10:38:06 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'x-trace': 'ip-172-31-65-239->amazon_172_31_65_222', 'Age': '0', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains', 'X-Cache': 'MISS'}

Testing

Open: localhost - Moodle File

localhost - Deleted Moodle file

Getting a file on the first and an error in the second.

marcospri commented 2 weeks ago

@robertknight @acelaya

Made some changes based on the suggestions:

robertknight commented 2 weeks ago

We check response.content instead of response.text. I don't think it would make a big difference but we are looking for bytes so that more explicit.

In the present, I agree. My comment was mostly about doing the "proper" thing to avoid nasty surprises in future because our code is improperly treating binary data as text and the handling of this changes (see eg. https://github.com/hypothesis/h/pull/8953)