mmcdole / gofeed

Parse RSS, Atom and JSON feeds in Go
MIT License
2.56k stars 208 forks source link

XML syntax error on line 34: illegal character code U+0008 #180

Open minhdanh opened 3 years ago

minhdanh commented 3 years ago

Expected behavior

RSS feed parsed correctly

Actual behavior

gofeed cannot parse RSS feed, with the following error:

XML syntax error on line 34: illegal character code U+0008

Steps to reproduce the behavior

Parse this feed: http://newsletter.grokking.org/?format=rss

Apparently there's a "strange" character in line 34

minhdanh commented 3 years ago

I can see there's a similar issue which has been fixed a long time ago: https://github.com/mmcdole/gofeed/issues/25 But this still happens with the latest version of gofeed (v1.1.3)

purefun commented 3 years ago

I reproduced it.

I also ran into another char: U+0004:

XML syntax error on line 1681: illegal character code U+0004 https://changelog.com/posts/feed

anzhihe commented 2 years ago

I also encountered this problem recently.

version:gofeed v1.1.3,go 1.17

XML syntax error on line 211: illegal character code U+0008

https://chegva.com/feed/

mmcdole commented 1 year ago

I want to be able to parse feeds with illegal characters, so I've opened #206 to see how we should handle this.