Closed glennklockwood closed 1 year ago
Let me do a quick manual debug to see what's going on - back in a bit!
Looks like we will also want some cleanup of the markdown name - blogger produces an interesting path!
blog/_posts/glennklockwood/2022-11-24-tag:blogger.com,1999:blog-4307061427721284246.post-2068110509046297403.md
okay does this reproduce? E.g., looks ok, but then a lot of html block?
I'm going to also do a nice little refactor to quickly show the author tag
don't worry working on your bug now!
okay got that fixed - new bug! Blogger (it looks like) prevents you from linking an external image url:
Need to think about a way around this.
This looks like a known (intentional) issue that Google knows about https://support.google.com/blogger/thread/133238986/image-url-from-blogger-googleusercontent-com-is-not-accepted-by-other-websites-if-i-want-to-insert-m?hl=en. I think for now I'm going to try to filter out these images, and perhaps with a later update we can do an efficient way to get and store them. I don't want to start with that because it will take up space very quickly.
okay I've pushed a fix to get you on the map! It includes (for the time being) removing these images that we aren't allowed to embed. If you want to discuss different parsing strategy please open an issue! The PR also added the nice tags, and better handled the markdown file name and subsequent permalink URL.
Thanks for getting this fixed so quickly! Doesn't seem like many people use Blogger anymore so I appreciate you getting this to work.
It looks like some residual Python crept into the rendered output of https://hpc.social/blog/2022/sc-22-recap/:
Not the end of the world; just fyi.
haha no you are spot on, I caught that too (just pushed a fix!) I needed to stringify the soup instead of returning renderedContent.
should be less terrible now :laughing:
Blogger uses
<a name="more"></a>
to separate the part of blog posts that should be shown "above the fold" of the landing page. When I try to run my blog's RSS through the blog syndicator though, everything past the<a name="more">
is no longer converted into markdown and is spit out as escaped HTML.I tried to visually inspect the RSS feed coming out of blogger and its contents look the same above and below this
<a name="more">
divider, so I think something is going wrong upstream of generate_posts.py (like feedparser?). Any ideas? I couldn't find any obvious causes.