Closed MRuecklCC closed 2 years ago
While the legacy endpoint (#148) replies with a 502 in case of errors in general, The current MetaDataManager implementation still respons with 500 errors in case it cannot fetch splash or lighthouse. Here, we need to distinguish between:
Because the rework of the content class in #149 and how the communication with splash is now initiated from within the extractors, the extractors also need to handle the potential errors of communication with splash/lighthouse.
The first quick fix will be to simply:
these exceptions will then propagate outwards and should return in the correct HTTP error codes (400, 502).
The downside of this is, that the API Layer HTTPExceptions (fastapi) leak into the extractor/Content class. It also means, that the MetadataManager class has a hard time consolidating those errors, as it can only intercept the HTTPExceptions.
A long term much cleaner solution woudl be to introduce custom exception classes such as:
This way we can keep the API layer exceptions out of the core classes and give the metadata manager a chance to consolidate exceptions of different extractors.
Content
class should either not call splash in the first place or somehow handle non HTML Content. Also tests for that scenario need to be added.Content
class should check and handle 404 errors for the request chain in the HAR. If a content is not available, theContent
class should not provide an HTML for the URL.