sergiotapia / magnetissimo

Web application that indexes all popular torrent sites, and saves it to the local database.
MIT License
2.99k stars 187 forks source link

[ADD] x[BiT] as a provider (have rss and api) #86

Closed scriptzteam closed 6 years ago

scriptzteam commented 7 years ago

https://xbit.pw/readme

https://xbit.pw/?search=ubuntu

Base-RSS:
https://xbit.pw/rss

Base-RSS search:
https://xbit.pw/rss?search=ubuntu

Base-RSS search with limits number of output:
https://xbit.pw/rss?search=ubuntu&limit=5

Basic magnet view:
https://xbit.pw/?id=1

Basic magnet files view:
https://xbit.pw/?files=1

Recently discovered files:
https://xbit.pw/files

Stats page:
https://xbit.pw/stats

---------------------------------------------------------------

Json-API with search and limit output (limit can be max. 100):
https://xbit.pw/api?search=ubuntu&limit=3

Json-API latest 1000:
https://xbit.pw/api

---------------------------------------------------------------

:)
skwerlman commented 7 years ago

There are unfortunately a couple issues with the site that'll make it hard to get right:

  1. No trackers are included in any of the magnet links. (not really the biggest problem, but a usability issue for people who can't use DHT)
  2. There's an empty entry at the end of the dht_results table in API responses.
  3. The site's DHT scraper doesn't handle some encodings correctly, resulting in mojibake names. (like this: [tvN] 삼시세끼 바다목장편.E05.170901.720p-NEXT.mp4 (id=106189))
  4. Magnet links are not included cleanly in the RSS feed, meaning I can't reuse the RSS scraping behavior from nyaa.si
  5. where magnets are included (in the RSS), they look like this:
    <description>
    <![CDATA[
    <b>ID:</b> 106760<br /><b>MAGNET:</b> magnet:?xt=urn:btih:7eb8b1cc32c2078afb04e3c6aa842f9a7288afdb&dn=%D0%9A%D0%BE%D0%BC%D0%BF%D0%BE%D0%BD%D0%B5%D0%BD%D1%82%D1%8B_%D0%B8_%D1%82%D0%B5%D1%85%D0%BD%D0%BE%D0%BB%D0%BE%D0%B3%D0%B8%D0%B8-2014<br /><b>NAME:</b> Компоненты_и_технологии-2014<br /><b>SIZE:</b> 542.95MB<br /><b>DISCOVERED:</b> 2017-09-03 18:30:38
    ]]>
    </description>

That said, I'm gonna try and get this site implemented.

scriptzteam commented 7 years ago

I dont see the characters like you see them :)

Check https://xbit.pw/?id=106196 Output name is [tvN] 삼시세끼 바다목장편.E05.170901.720p-NEXT.mp4

All you need to do is URLDECODE the name part :)

%5BtvN%5D+%EC%82%BC%EC%8B%9C%EC%84%B8%EB%81%BC+%EB%B0%94%EB%8B%A4%EB%AA%A9%EC%9E%A5%ED%8E%B8 --> https://urldecode.org/?text=%255BtvN%255D%2B%25EC%2582%25BC%25EC%258B%259C%25EC%2584%25B8%25EB%2581%25BC%2B%25EB%25B0%2594%25EB%258B%25A4%25EB%25AA%25A9%25EC%259E%25A5%25ED%258E%25B8&mode=decode

AAnd also your ID about you talkin: https://xbit.pw/?id=106189 http://urldecode.org/?text=%25D0%259F%25D0%25BE%25D0%25BF%25D1%2581%25D0%25BE%25D0%25B2%25D1%258B%25D0%25B9%2B%25D1%2580%25D0%25B0%25D0%25B9.%2B%25D0%25A1%25D1%2583%25D0%25BF%25D0%25B5%25D1%2580%25D1%2581%25D0%25B1%25D0%25BE%25D1%2580%25D0%25BD%25D0%25B8%25D0%25BA%2B%25D0%25BE%25D1%2582%2B%25D0%25A0%25D1%2583%25D1%2581%25D1%2581%25D0%25BA%25D0%25BE%25D0%25B3%25D0%25BE%2B%25D1%2580%25D0%25B0%25D0%25B4%25D0%25B8%25D0%25BE%2B%25282016%2529&mode=decode --> Попсовый рай. Суперсборник от Русского радио (2016)

:)

skwerlman commented 7 years ago

I saw the issue with the names appear in the JSON API, so maybe the bug is in how that is generated.

The ID was a copy-paste error, my bad.

tchoutri commented 6 years ago

@skwerlman Can we consider starting working on the xBit crawler soon? Or is it bound to be a massive headache? (which doesn't bother me, but it will just lower its priority ;)

skwerlman commented 6 years ago

It doesn't look too hard, although the RSS feed is malformed, and neither the feed nor the JSON api contain seeder/leecher counts.

I can either set those to 0, or give up the RSS/API speed improvement and scrape each torrent's detail page. Which is preferable?

tchoutri commented 6 years ago

Ideally we should send them an email and kindly ask them to add the relevant details to their API… I don't like that but let's not make waves yet and use the API, with its flaws :/

skwerlman commented 6 years ago

investigating further, it doesn't look like detail pages have seed/leecher counts either, so i guess we'll have to default to 0 either way.

tchoutri commented 6 years ago

@skwerlman yeah and I don't feel like crawling the DHT for the infohashes :p Do you think you can spend some time doing this crawler? You can base its architecture on the EZTV crawler :)

skwerlman commented 6 years ago

haha that's exactly what i'm doing right now

tchoutri commented 6 years ago

Fantastique :)