openzim / ted

Provide the best of TED.com for offline usage!
https://download.kiwix.org/zim/ted/
GNU General Public License v3.0
13 stars 8 forks source link

Enhance `extract_videos_in_search_results` to not fail on null `recordedOn` and log HTML content when parsing issue occurs #166

Closed benoit74 closed 4 months ago

benoit74 commented 4 months ago

Rationale

Fix #161 Close #163 (not really a fix, but should it reproduce again we will have more logs)

Changes

codecov[bot] commented 4 months ago

Codecov Report

Attention: Patch coverage is 0% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 0.00%. Comparing base (e5c6401) to head (7d0833a).

Files Patch % Lines
src/ted2zim/scraper.py 0.00% 12 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #166 +/- ## ===================================== Coverage 0.00% 0.00% ===================================== Files 7 7 Lines 945 950 +5 Branches 218 218 ===================================== - Misses 945 950 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

benoit74 commented 4 months ago

In the past, we had a similar output when the server returned a different response like a 40x/50x or whatever: we're thus parsing stuff that we expect to be JSON but can be HTML or empty or nil.

It is not exactly the same, because we have a req.raise_for_status() (don't ask me why we have named this req since it's a response 😉). So here status code is 2xx but content is inappropriate ... weird.