Closed snarfed closed 10 years ago
This is "known issue" with mf2py backcompat and Wordpress.com blogs. They mark up the whole center part with entry-content
... When that gets mapped to e-content
, it includes way more than e-content is supposed to include.
Another example of this is when I tried to reply to my test wordpress.com blog https://kylewm.com/reply/2014/05/05/1
interesting parallel! the content and summary fields are coming from superfeedr here, but it sounds like wordpress.com's overly broad entry-content
may be causing the same problem for both superfeeder and mf2py.
reopening due to #213. the current fix assumes categories and tags are at the bottom, but they can be at the top too. e.g. http://eflnotes.wordpress.com/2014/01/17/corpus-linguistics-community-news-2/ for user https://www.brid.gy/wordpress/eflnotes.wordpress.com
instead of filtering the content HTML, i'm going to just discard all webmentions for "self links" to the user's domain.
seeing this again for https://www.brid.gy/wordpress/peterccook.com . reopening.
false alarm. his posts just tend to have a lot of links.
looks like either we're overly aggressive or superfeedr is including non-content (e.g. links to tags, prev/next, feeds, etc) in their
content
andsummary
fields, which we extract all links from.example for http://likeiwassayingblog.com/2014/06/27/im-making-a-list-of-things-i-need-to-do-before-julie-gets-home-its-extraordinary-how-many-items-contain-the-words-clean-and-cat/ (from https://www.brid.gy/wordpress/likeiwassayingblog.wordpress.com#blogposts ):