isohuntto / openbay-db-dump

GNU General Public License v3.0
110 stars 39 forks source link

All in one SQL release - 23558517 torrents #12

Open nicoboss opened 9 years ago

nicoboss commented 9 years ago

http://www.nicobosshard.ch/Documents/all_in_one_SQL_DB_Dump_V1_January_2015.7z.torrent http://torcache.net/torrent/9E221A6193C09E49A2EC8758A9A8BCE368DD05FD.torrent https://drive.google.com/file/d/0B13QNh6ZKU3TX1V3Qjc3LWNKNjA/view?usp=sharing http://www.nicobosshard.ch/Documents/all_in_one_SQL_DB_Dump_V1_January_2015.7z https://mega.co.nz/#!A0NyEKQA!h1i2dZIy_-VuCyKKfJdpcPoA28FzWx_rYOImGMHvZlI

Today I released my first all in one DB dump. It contains 23558517 torrents from OpenBay (2 databases), Kickass, TorrentProject.se and BitSnoop. I’m very happy to give this DB for witch I’ve spent a lot of my holiday and free time. I will continue to improve this Database to help to save all that amazing torrent files and to keep the internet as a place of liberty!

How I made this DB: First I’ve downloaded over 50 GB data and wrote some C++ programs to convert that HTML, csv and txt files into a importable SQL or CSV file. After that I put them into the correct schema and removed corrupted or duplicated rows. Every time I’ve got a working DB I released it to this project. After 5 DB (3 from me) I’ve decided to make the first all in one dump. For that I’ve copied all 5 DBs together into one very Big DB. I put those DB at the top that contains the biggest amount of additional information because I used ALTER IGNORE TABLE one.torrents ADD UNIQUE (hash). The IGNORE parameter mean that every duplicate that comes after he first one will be ignored. So this DB contains only this one with the highest quality.

I’ve used the following order:

  1. Openbay defult dump
  2. Kickass dump V2
  3. Openbay csv dump
  4. TorrentProject.se dump
  5. BitSnoop bakup dump V1

If you have any questions, mistakes, wishes or something like that you can always writ in this or some other of my Issues. I try to answer as soon as possible. I’ll probably update this DB every time I get a big amount of new, better or more recent data.

ssdr commented 9 years ago

sounds pretty good~ I'll have a try~

ghost commented 9 years ago

torcache seems down, if you want some seeding help, just upload the .torrent file elsewhere. good job indeed !!!

Any suggestions to make the search and everything faster, i mean server setup, etc... Got an i7 920 with 16gb ram...thanks!

pmwoodward3 commented 9 years ago

Would it be possible to get a .txt version of this so it works with Bitcannon?

nicoboss commented 9 years ago

@ekoice 16 GB RAM are enough. My server has 32 GB but only using 2.5 GB (1.5 GB for Sphinx and 1 GB for SQL) with a search speed is 1s per request on the all in und DB. Have you already tried what I wrote about speeding up SQL under https://github.com/isohuntto/openbay-db-dump/issues/10? The most important thing is to expand the MySQL reading buffer and use the MyISAM storage engine! In the all in one DB the MyISAM, utf8_general_ci and unique hash thing are pre-configured. The slow point isn't by Sphinx on our site. I'm sure something in your SQL configurations are completely wrong because the default one wasn’t made for such a big DB. It can't be that you get a speed of 7s per request by using a much smaller DB then I.

@pmwoodward3 No problem for me to do this but what schema do you need? If I only use hashes and names then you'll lose a lot of important data. I can give you something like this in *.txt format: id|name|description|category_id|size|hash|files_count|created_at|torrent_status|visible_status|downloads_count|scrape_date|seeders|leechers|tags|updated_at The order and witch data you would have is your decision.

csmarauder commented 9 years ago

Finally had some free time to night got it imported and rebuilt the sphinx index everything looks good. This is still a useless app with out the ability to easily add new torrents or manage anything but its fun to play with. Thanks for merging all these dumps!

by the way mysql> select count() from torrents; +----------+ | count() | +----------+ | 23558517 | <-- number of entries +----------+ 1 row in set (0.10 sec)

ghost commented 2 years ago

Hope it was you 9x.xxx.xxx.xx3 Deluge 1.3.15 If not here is the link https://mega.nz/file/EzZ3FKgL#qk_eo0DNqf9z5vdm-RLYJ3Q6im8ogAhr_ooDRdjop3c