isohuntto / openbay-db-dump

GNU General Public License v3.0
110 stars 39 forks source link

TorrentProject.se DB dump for the Open Bay Project #10

Open nicoboss opened 9 years ago

nicoboss commented 9 years ago

http://torcache.net/torrent/EACEF60FE77A771ACBC28E6D65A593BDB800EA28.torrent https://mega.co.nz/#!J5sm3YAS!_QId1T0SCZlGMkMGNNVa9vDmOtVgiZn4fmi350fFH_o https://drive.google.com/file/d/0B13QNh6ZKU3TaGcwcWxqU0V5blk/view?usp=sharing https://www.dropbox.com/s/urhx0lkn2bo1fzr/TorrentProject_DB_Dump_January_2015_V1.7z?dl=0 http://www.nicobosshard.ch/Documents/TorrentProject_DB_Dump_January_2015_V1.7z

Today I released a new DB for the Open Bay Project! This one is form the famous http://torrentproject.se/ site and was made by using the official dailydump. It contains 5685235 hashes. It’s a wonderful DB and the best thing is, that it’s very easy to update. This dump that contains a little more torrents than the one from kickass but less additional information. This program will also work for other sites that contain an api export function like https://kickass.so/ but don’t forget that HTML dumps contain much more additional information. Thanks @ekoice for your idea to use the api and emphasize the importance of a simple update process. Now we have 4 DB Dumps:

How to import this DB:

  1. Download and extract the DB
  2. Import the 2 sql Files with LOAD DATA INFILE or with a graphical Interface like PHPMyAdmin.

How to update this DB in 2 Steps:

  1. Download and extract or clone https://github.com/nicoboss/KickassCopy and go into the UpdateTorrentProject folder and open the update.bat (Windows x64 only) file and wait some minutes. You can also use the UpdateTorrentProject in the Torrent file but it might be outdated but it would also work.
  2. Copy the dailydump.csv to your /mysql/data/…/ folder and customize and execute the following SQL script. The scheme for the torrentproject.dailydump can you find also in the torrent file and the db.torrent is your final DB.

LOAD DATA INFILE 'dailydump.csv' INTO TABLE torrentproject.dailydump_V3 FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';

INSERT IGNORE INTO db.torrent (name, hash, tags) SELECT name, hash, tags FROM torrentproject.dailydump;

TRUNCATE TABLE torrentproject.dailydump;

UPDATE db.torrent SET description='' WHERE description is NULL; UPDATE db.torrent SET category_id=2 WHERE tags like 'Applications%'; UPDATE db.torrent SET category_id=3 WHERE tags like 'Games%' or tags like 'Mobile%'; UPDATE db.torrent SET category_id=4 WHERE tags like 'Adult%'; UPDATE db.torrent SET category_id=5 WHERE tags like 'Video%'; UPDATE db.torrent SET category_id=6 WHERE tags like 'Audio%'; UPDATE db.torrent SET category_id=7 WHERE tags like 'Images%'; UPDATE db.torrent SET category_id=7 WHERE tags=''; UPDATE db.torrent SET category_id=8 WHERE tags='Video Tv'; UPDATE db.torrent SET category_id=9 WHERE tags like 'Ebooks%';

ghost commented 9 years ago

this db is good, too bad havent torrent age, seeders, leechers, files count... good work indeed...

csmarauder commented 9 years ago

Any chance we can get a sql file out of this also I have been considering taking all the dumps and making one big dump file out of them anybody think this is a good idea?

nicoboss commented 9 years ago

I'll release one big all in one dump today. The reason why I've waited so long was that I don't get a reply of the bitsnoop team why their daily dump download link doesn't work. Now I've found a backup of the 26. February 2014 that contains 16949868 torrents (names and hashes). Not the best solution for the bitsnoop dump but enough to use it for my first all in one release.

ghost commented 9 years ago

@nicoboss do that big backup have leechers/seeders working?

nicoboss commented 9 years ago

No, sorry only names and hashes. I know maybe it's useless for you but as soon as the official bitsnoop api works again I will make some new version with seeders, leechers and categories. If they wouldn't fix it until the next week I wouldn't have any other choose then to make a HTML dump of bitsnoop or torrentz.eu. But booth aren't easy dumps. bitsnoop haven’t the hashes on the search pages and torrentz.eu gives my always an IP-ban after 1000 Downloads (1000*48=48000 Links/IP/Day). And the stupid thing is that it is nearly impossible to change the IP Address every 10min except by using the Tor Network but that doesn't support wget or HTTrack. I'll release the full dump tomorrow because the duplicate check of 39766825 would take very long. I lost 4h because I used InnoDB instead of MyISAM. I'll never use InnoDB again for big databases!

InnoDB: • Time to copy a 2.5GB DB: 3h • Time to load the last row of 7'000'000: 90s • Add randomly empty rows by copy a big DB often at the end!!!

MyISAM: • Time to copy a 2.5GB DB: 60s • Time to load the last row of 7'000'000: 10s

TPBT-OFFICIAL commented 9 years ago

@nicoboss interesting..... also, about the innoDB and MyIsam..... does that explain why tpbt.org (my website) takes so much time to load every once in a while?

nicoboss commented 9 years ago

Yes probably. There are three things how you can optimize your speed:

  1. Expand all buffers. That'll use more memory but the speed will improve a lot.
  2. Use MyISAM (First copy the schema, then change the DB engine of the copied schema and finally copy all data into it. Don't conveart the InnoDB into a MyISAM with the convert function!)
  3. Use utf8_general_ci

Booth storing systems has their advances and disadvances but because Open Bay doesn’t need the special InnoDB functions so I think that MyISAM is the right choice. But if you tune up InnoDB you’ll also get a lot more speed. I don't know why InnoDB is so slow on my server (XAMPP Windows). Normally the different shouldn’t be so big. The best thing is to try bouth with diffrend configurations.

I'm using the following configurations in my my.ini config files to get more speed by using more RAM (500MB to 1GB):

[mysqld] key_buffer = 16M max_allowed_packet = 10M sort_buffer_size = 20M net_buffer_length = 64K read_buffer_size = 20M read_rnd_buffer_size = 20M myisam_sort_buffer_size = 80M

innodb_buffer_pool_size = 512M innodb_additional_mem_pool_size = 16M innodb_log_file_size = 16M innodb_log_buffer_size = 8M

[mysqldump] quick max_allowed_packet = 16M

[isamchk] key_buffer = 40M sort_buffer_size = 40M read_buffer = 20M write_buffer = 20M

[myisamchk] key_buffer = 40M sort_buffer_size = 40M read_buffer = 20M write_buffer = 20M