Open dw opened 10 years ago
Some of these may be missing from Algolia. One example I got chance to look at:
https://news.ycombinator.com/item?id=90945
I converted the story post time to Unix epoch (there are tons of sites to subtract days from date and then convert date to Unix timestamp) and then did following API query to get post by author that was less than this time.
The result I get does not have the above story which indicates that Algolia itself doesn't have the story.
I've filed issue at Algolia's hn-search repo: https://github.com/algolia/hn-search/issues/33
Looks like these IDs are available via the new HN API.. https://github.com/HackerNews/API
Hi there,
Per my Reddit comment at http://uk.reddit.com/r/datasets/comments/26xqgs/downloading_all_of_hacker_news_posts_and_comments/ , there are 641k IDs that don't appear anywhere.
It looks like either your crawler or Algolia don't have a complete data set.
I manually checked some of the missing IDs, and some lead to deleted posts, the vast majority appear to lead to legitimate comments/links.
If it's a problem with your script, I guess that is easiest to fix. If it is a problem with Algolia, then I guess we're out of luck :(