lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
455 stars 38 forks source link

Fix #240 Skip enclosures without href #242

Closed mirekdlugosz closed 3 years ago

mirekdlugosz commented 3 years ago

Fix for #240. Enclosures without href in RSS feeds are skipped, so feed can be handled by reader. See below for reproducer and test results.

I have not added enclosure with empty href to atom feed, because things are more involved there. Following XML:

<link rel="enclosure" href="" />

produces valid (if pointless) Enclosure:

Enclosure(href='http://example.com/full.atom', type='text/html', length=None)

I only skimmed through ATOM specification, and I think ATOM requires links to contain at least scheme. Meanwhile, feedparser considers them to be relative and resolves to base. I don't think it's worthwhile for us to work around this behavior, and I definitely wouldn't make feedparser's behavior change a requirement for this PR. But I'm willing to work on whatever you think is best here.

Reproducer:

$ python -m reader --db db.sqlite add --update 'https://associationforsoftwaretesting.org/feed/' -vvvv
2021-06-08T19:21:47   35750 WARNING  parse https://associationforsoftwaretesting.org/feed/: got NonXMLContentType('no Content-type specified')
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': entry 'http://associationforsoftwaretesting.org/?guid=2790e8d666f8777ddf9d6ef43ec0752e': entry updated
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': entry 'http://associationforsoftwaretesting.org/?guid=0e9ded36e84e0ee84862ce70d0ac5889': entry updated
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': entry 'http://associationforsoftwaretesting.org/?guid=d6a0fd1d522618ff01bedcbd40f5c9d0': entry updated
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': entry 'https://associationforsoftwaretesting.org/?p=73413': entry updated
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': entry 'https://associationforsoftwaretesting.org/?p=73420': entry updated
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': entry 'http://associationforsoftwaretesting.org/?guid=e4c04c0bdafa02246ced8de61bfed600': entry updated
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': entry 'https://www.satisfice.com/?p=487354': entry updated
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': entry 'http://associationforsoftwaretesting.org/?guid=baad494ed52ada19bc87a2c4eb0eb264': entry updated
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': entry 'https://www.kenst.com/?p=3878': entry updated
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': entry 'http://associationforsoftwaretesting.org/?guid=10999f1b125d0f226d0c6bd099edafc8': entry updated
2021-06-08T19:21:47   35750 DEBUG    update feed 'https://associationforsoftwaretesting.org/feed/': old updated None, new updated 2021-06-04 19:24:29
2021-06-08T19:21:47   35750 INFO     update feed 'https://associationforsoftwaretesting.org/feed/': feed has no last_updated, treating as updated
$ ./run.sh coverage-all
<snip>
TOTAL                                        8214    410   1731     68    94%

================================================== 1433 passed, 50 skipped, 3 xfailed, 5 warnings in 56.28s ===================================================
Name    Stmts   Miss Branch BrPart  Cover
-----------------------------------------
-----------------------------------------
TOTAL    2764      0    803      0   100%

21 files skipped due to complete coverage.
$ ./run.sh typing
Success: no issues found in 38 source files
lemon24 commented 3 years ago

Merged. Thank you!

The CI failure is likely due to a new Mypy version (so, not due to this).

I don't think it's worthwhile for us to work around this behavior [...]

Agree, I definitely don't want to change feedparser's behavior.