mediacloud / metadata-lib

How Media Cloud approaches extracting metadata from online news stories
Apache License 2.0
12 stars 5 forks source link

Fix title parsing failure (due to empty or whitespace title tag) #74

Closed rahulbot closed 10 months ago

rahulbot commented 10 months ago

This addresses the cases reported in #73, which related to empty or missing title tags. The net effect is that this is more resilient title parsing. This change includes new test cases and a very small code tweak. Probably worth integrating quickly into story-indexer to improve our data parsing. All tests pass locally.

pgulley commented 10 months ago

lgtm!