add xbit api crawler - Githubissues

sergiotapia / magnetissimo

Web application that indexes all popular torrent sites, and saves it to the local database.

MIT License

3k stars 190 forks source link

add xbit api crawler #98

Closed skwerlman closed 6 years ago

skwerlman commented 6 years ago

The xBit crawler is based on the EZTV one, but uses the JSON API instead of an RSS feed because of encoding errors in the feed.

I had to copy and modify the fix_size function from the NyaaSi crawler to get the sizes in a usable format, so it seems like there's a need for a proper general solution to be used everywhere, instead of reimplementing it as-needed. I didn't write it here because it's out of scope (and probably deserving of its own library).

There's a bug in the xBit API which adds an empty torrent at the end of the dht_results list, which necessitated a workaround, explaining the odd-looking case in process.

There's no way to limit how many results we get back from the API (despite the documentation disagreeing), so we always scrape 1000 torrents at a time.

fix #86

skwerlman commented 6 years ago

Discovered another issue just now: the xBit API fails to correctly generate JSON, leaving things like \ unescaped. Whenever something like this makes it into their API, it prevents Poison from parsing the data for any of the torrents.

I could work around this on our end by reescaping strings; lemme know if it's worth the extra complexity.

tchoutri commented 6 years ago

Thank you very much :)

lemme know if it's worth the extra complexity

If you could give it a try it would be charming!

scriptzteam commented 6 years ago

"There's no way to limit how many results we get back from the API (despite the documentation disagreeing), so we always scrape 1000 torrents at a time."

Waaaat? https://xbit.pw/api?limit=10&search=ubuntu

skwerlman commented 6 years ago

That method requires us to search for something, which we don't want to do here.

Basically, what we'd want to be able to do is https://xbit.pw/api?limit=75, which currently doesn't work.

scriptzteam commented 6 years ago

Thx fixed :)

https://github.com/scriptzteam/xBiT-Torrents-Magnets-Indexer/issues/3

skwerlman commented 6 years ago

thanks for the fix!

now that i look at the stats in that repo, i see you guys index ~3000 torrents/hour. would you be okay with us checking the api a bit more frequently? (probably every 15-20min instead of every 30min)

scriptzteam commented 6 years ago

Sure you can :) The "~3000 torrents/hour" is kinda hard, since sometimes we can get 50,000 magnets per day and another day its only 2,500 magnets so you never know :D

tchoutri commented 6 years ago

@scriptzteam Thank you very much for your work :) We're not getting much feedback from torrent websites, so be assured that if we can improve things in the future, you can always reach us :)

tchoutri commented 6 years ago

@skwerlman great!