Closed jywarren closed 8 years ago
It does do some scrubbing, but it should allow image tags (and clearly does, since other images are getting through). This is the sanitized summary as stored in the DB:
Here’s a Gosper curve cut into paper with a Silhouette Cameo desktop paper cutter. Thanks to Owen Maresh for showing m
e the Gosper curve, which is a space-filling curve formed with a single line, and therefore, here, with a single cut.<div cla
ss="crp_related"><h3>Related Posts:</h3><ul><li><a class="crp_title" href="http://unterbahn.com/2015/08/vectorizing-sketches-
and-photos-with-your-smartphoneweb-browser/">Vectorizing sketches and photos with your smartphone/web…</a></li><li><a
class="crp_title" href="http://unterbahn.com/2013/05/for-controlling-spacetime/">For controlling spacetime</a></li><li><a cla
ss="crp_title" href="http://unterbahn.com/2013/05/space-oddity-cover-music-video-recorded-on-the-iss/">Space Oddity cover, mu
sic video recorded on the ISS</a></li><li><a class="crp_title" href="http://unterbahn.com/2012/10/muddling-through-tax-revenu
e-numbers/">Muddling through tax revenue numbers</a></li><li><a class="crp_title" href="http://unterbahn.com/2014/03/balopaix
ao-is-gone/">BaloPaixão is gone</a></li></ul></div>
Docs about feedparser's sanitization:
https://pythonhosted.org/feedparser/html-sanitization.html
The above is consistent with the theory that it's using the description element; if you actually download the feed and look at the entry you'll see that it does include the rest of the content there, not just the text that you quoted.
I'm not totally sure what the right approach is here. I don't know if feedparser exposes a way to change this, and I'm not keen on mucking around in its plumbing unless I need to. Thoughts?
Well, do you specify the "description" element, or is that a Feedparser default? It does seem that it has methods for accessing common RSS elements; could we just grab "content:encoded" instead of "description"? Can you point me at the relevant code in IB? Thanks!
https://pythonhosted.org/feedparser/common-rss-elements.html#accessing-common-channel-elements
On Sat, Aug 22, 2015 at 4:29 PM, Ian Denhardt notifications@github.com wrote:
It does do some scrubbing, but it should allow image tags (and clearly does, since other images are getting through). This is the sanitized summary as stored in the DB:
Here’s a Gosper curve cut into paper with a Silhouette Cameo desktop paper cutter. Thanks to Owen Maresh for showing m e the Gosper curve, which is a space-filling curve formed with a single line, and therefore, here, with a single cut.<div cla ss="crp_related">
Related Posts:
- Vectorizing sketches and photos with your smartphone/web…
- <a class="crp_title" href="http://unterbahn.com/2013/05/for-controlling-spacetime/">For controlling spacetime
- <a cla ss="crp_title" href="http://unterbahn.com/2013/05/space-oddity-cover-music-video-recorded-on-the-iss/">Space Oddity cover, mu sic video recorded on the ISS
- Muddling through tax revenue numbers
- BaloPaixão is gone
Docs about feedparser's sanitization:
https://pythonhosted.org/feedparser/html-sanitization.html
The above is consistent with the theory that it's using the description element; if you actually download the feed and look at the entry you'll see that it does include the rest of the content there, not just the text that you quoted.
I'm not totally sure what the right approach is here. I don't know if feedparser exposes a way to change this, and I'm not keen on mucking around in its plumbing unless I need to. Thoughts?
— Reply to this email directly or view it on GitHub https://github.com/zenhack/iron-blogger2/issues/47#issuecomment-133751812 .
https://github.com/zenhack/iron-blogger2/blob/master/ironblogger/model.py#L242
...I'd forgotten how gross that bit was.
What we're actually trying to grab is "summary", but I think feedparser abstracts things a bit and falls back to other things if it's not there. The library isn't a 1:1 mapping to atom or rss; it's intented to let the programmer not care about the differences.
Whatever we end up doing, I have two concerns:
Had a look; content:encoded
is supposed to be the full post, so it fails criterion (2). I don't suppose there's a way to configure wordpress to put the right things in the description? It's doing some other weird things too, like putting related posts in there...
@jywarren, I don't see a clean way to fix this. Unless you have any ideas I'd like to tag this as wontfix and close. Let me know.
I think that, just to summarize, it's looking for <summary>
, and gets <description>
which is actually a summary which Wordpress prepares, which doesn't include images.
I guess we just leave it -- it's too bad that the combined display won't show some of the really nice images, especially since in a non-zero number of my posts, there is no content except for images. Text over image content-type bias? :-)
Anyhow I'll file a related idea I had which helps a little bit.
Alright, closing then.
I'm not sure if it's because it scrubs some markup, or if it is reading
<description>
and not<content:encoded>
. The latter includes inline images in a standard WordPress feed: