maddyblue / goread

RSS reader in go on app engine; formerly goread.io
ISC License
2.38k stars 295 forks source link

Low count of stories #239

Closed daGrevis closed 11 years ago

daGrevis commented 11 years ago

I switched to goread.io a few weeks ago. Before I was using Feedly. Funny thing is that goread.io says that I have around 50 unread stories when I'm back at work after the weekend. With Feedly the count was around few hundred. Is there some limit for yours-hosted goread?

maddyblue commented 11 years ago

No limit, just a bug.

maddyblue commented 11 years ago

Oh, could you please list specific stories or feeds that are not listed in goread but are listed in feedly? That would help track this down.

daGrevis commented 11 years ago

Hmm. I'll reset goread.io and feedly to zero and ping back with screenshots after a day.

Lockyc commented 11 years ago

Anecdotally i have noticed an unread item difference from google reader to goread.io, but have not tested anything.

@mjibson The new interface update is amazing! Works so much better in firefox now :D

daGrevis commented 11 years ago

The difference for me is 1:27 stories. Most of them are from HackerNews.

maddyblue commented 11 years ago

Hacker News is known: they heavily throttle app engine IP addresses, so only we only get about 1 update per day. Not much I can do to fix that. Any others?

maddyblue commented 11 years ago

@Lockyc uh, what didn't work before? I wasn't aware of any problems.

daGrevis commented 11 years ago

Can you explain me why exactly HN isn't working? I didn't catch you.

maddyblue commented 11 years ago

App engine runs from a known block of IP addresses. When a connection is made to HN, it can detect if the source IP is from within the app engine block. If it is, the connection is dropped on the server: no data is returned, and the app engine fetch process timeouts after the set 60s.

Sometimes it works fine. Thus, I suspect that HN has a throttle service. There are probably lots of other applications within app engine also trying to connect to HN to scrape for stories or RSS. Since this is a big pool, HN's throttler might limit it to, say, 100 connections/hour or something. Thus, if 100 people have already requested in the same hour, goread gets blocked because it's in the same pool.

Craigslist feeds have a similar problem: app engine's fetcher always sets a app engine user agent (this cannot be changed). When they see that user agent, connection is dropped. (This is known because CL feeds don't work in the dev SDK, which has the same user agent setting. HN feeds do work in the dev SDK, suggesting it's an IP address block throttle.)

daGrevis commented 11 years ago

In other words, HN can't give feeds to all readers because of performance problems?

Lockyc commented 11 years ago

@mjibson the interface in firefox used to get really slow really quickly. It seems to be much better now, although still gets a little slow. As for my anecdotal evidence I haven't had a chance to do any testing, but i find that if i don't read any articles for a couple of days the unread count never really goes over about 500 items (usually around 400). If i left my google reader acount for a couple of days i could potentially be over 1000 unread items. I dont have any high flow feeds like hackernews or the like. I'm not sure how i could test this other than subscribing to another feed service with my subscriptions and testing.

maddyblue commented 11 years ago

Yes, I agree. Feed counts seem like they should be higher. I'm going to start comparing with feedly and see what goread is missing out on. It's difficult, too, because update frequencies are different. So something could just be not updated yet instead of missed.

Lockyc commented 11 years ago

Yeah, I can imagine it would be tricky to track down what's happening. Sorry I don't have better information for you right now. There is also the outside possibility that Feedly isn't working correctly either :P