pipes-digital / pipes

Repository for Pipes
https://pipes.digital
GNU Affero General Public License v3.0
261 stars 21 forks source link

Errors with CraigsList RSS feeds #43

Closed sashaikevich closed 6 years ago

sashaikevich commented 6 years ago

Really excited if Pipes is back!

I used Y's Pipes to look through craigslist, but pipes.digital is having a bit of trouble with their RSS feeds: I see "No Result" at Combine block's output (even though the Feed block output has content), and "Internal Server Error" for the pipe output (https://www.pipes.digital/feed/aNg2Aa9J)

I think I also saw a message somewhere as I was experimenting with the blocks, letting me know that it couldn't parse the source.

Is there anything you can do to help me fix it?

onli commented 6 years ago

Hi. The good thing is that the feed block has a proper output, that should mean we will find a good fix here.

Why exactly do you use the combine block here? With just one feed connected it should do nothing. Regardless, this is certainly a bug and I'll look into it, it should just return the feed input.

When not using the combine block the sort block still does not work for me with the craigslist feed, I assume because the date is only given as dc:date. But I will debug this properly and report back.

sashaikevich commented 6 years ago

Sweet! And thanks for the prompt reply. I'll have about 70 feed sources, which I would combine, get rid of duplicates, filter for negative keywords, and output as a single RSS of business leads. (So, if there's a monthly fee to use the service as I described, please don't be shy about letting me know).

How frequently does your system fetch? (ie what's the feed update interval?)

Thanks!!

onli commented 6 years ago

The craigslist feed is quite strange, strange enough for the feedparser library to not be okay with it. I patched that library, and now your pipe is working :)

Does it contain all the information you need? Because the craigslist feed puts most information in an extra namespace, there will be some data missing. If something useful is missing please tell me and I will patch it in.

How frequently does your system fetch? (ie what's the feed update interval?)

It's a layered system, with a cache layer regulating the update interval. Each downloaded feed will be cached for 10 minutes. Plus a single domain will be only queried once each second, meaning with 70 feeds the pipe will take at least 70 seconds to download craigslist's feeds. The pipe's feed again is cached for 10 minutes. That means your update interval is ~12 minutes if the pipe's feed is queried regularly.

If that is too slow, let's talk by mail (support@pipes.digital) to find a solution for you.

(So, if there's a monthly fee to use the service as I described, please don't be shy about letting me know).

Thanks :) The free plan gives you three pipes and those theoretically would work for that, but if the pipes get that heavy subscribing to one of the paid plans would be greatly appreciated.

sashaikevich commented 6 years ago

Thank you for fixing it so quickly. I believe it works fine now.

But, since yesterday I found an RSS reader with great filter tools, which I'll be testing out.

onli commented 6 years ago

Okay. Since the craigslist feed works now I will close this issue, but feel free to reopen or to open a new issue if there are still problems with the feed.

sashaikevich commented 6 years ago

Sounds good. Thanks for your help! ᐧ

On Thu, Jul 5, 2018 at 5:49 PM, onli notifications@github.com wrote:

Okay. Since the craigslist feed works now I will close this issue, but feel free to reopen or to open a new issue if there are still problems with the feed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pipes-digital/pipes/issues/43#issuecomment-402767690, or mute the thread https://github.com/notifications/unsubscribe-auth/ARhxbSfvpUcgkVP_GNobsRKGfN8k9vRlks5uDjV1gaJpZM4VCnqk .