Closed skwerlman closed 6 years ago
Discovered another issue just now: the xBit API fails to correctly generate JSON, leaving things like \
unescaped. Whenever something like this makes it into their API, it prevents Poison from parsing the data for any of the torrents.
I could work around this on our end by reescaping strings; lemme know if it's worth the extra complexity.
Thank you very much :)
lemme know if it's worth the extra complexity
If you could give it a try it would be charming!
"There's no way to limit how many results we get back from the API (despite the documentation disagreeing), so we always scrape 1000 torrents at a time."
That method requires us to search for something, which we don't want to do here.
Basically, what we'd want to be able to do is https://xbit.pw/api?limit=75
, which currently doesn't work.
thanks for the fix!
now that i look at the stats in that repo, i see you guys index ~3000 torrents/hour. would you be okay with us checking the api a bit more frequently? (probably every 15-20min instead of every 30min)
Sure you can :) The "~3000 torrents/hour" is kinda hard, since sometimes we can get 50,000 magnets per day and another day its only 2,500 magnets so you never know :D
@scriptzteam Thank you very much for your work :) We're not getting much feedback from torrent websites, so be assured that if we can improve things in the future, you can always reach us :)
@skwerlman great!
The xBit crawler is based on the EZTV one, but uses the JSON API instead of an RSS feed because of encoding errors in the feed.
I had to copy and modify the
fix_size
function from the NyaaSi crawler to get the sizes in a usable format, so it seems like there's a need for a proper general solution to be used everywhere, instead of reimplementing it as-needed. I didn't write it here because it's out of scope (and probably deserving of its own library).There's a bug in the xBit API which adds an empty torrent at the end of the
dht_results
list, which necessitated a workaround, explaining the odd-looking case inprocess
.There's no way to limit how many results we get back from the API (despite the documentation disagreeing), so we always scrape 1000 torrents at a time.
fix #86