Open toolness opened 9 years ago
Here's an article discussing three strategies for the crawler problem:
<noscript>
to include content on the page<meta name="fragment" content="!">
to make the crawler request url with _escaped_fragment_=
and respond to that differently, somehow (google-only, sort of)All of these add some amount of duplication or overhead, although I guess that's sort of inherent in the task.
There's a lot more info on the _escaped_fragment_=
technique here.
@toolness also mentioned that we could pipe through the old pages from Wordpress for bots or old browsers, although for bots that would mean thumbnail previews were the old theme. Thumbnails are potentially tricky for <noscript>
solutions too, as simply dumping in the content without proper styles will mean thumbnails are also wrong.
I haven't found info on how Pocket works. Instapaper can read Open Graph Protocol. I assume anything done to assist crawlers will assist readers, but I'm not sure.
Maybe thumbnails aren't a big deal. I'm not sure if it's some setting I have set, but I don't actually see thumbnails of pages in my search results on google...
Since post detail pages are now pre-rendered on the server-side, I'm de-prioritizing this ticket from launch, as the search result pages don't seem like they'd be as important to get spidered/cached/etc.
Because everything is powered by JS, web crawlers and tools like Readability, Pocket, etc won't be able to make sense of the site.