isohuntto / openbay-db-dump

GNU General Public License v3.0
110 stars 39 forks source link

Kickass DB Dump for the Open Bay Project #7

Open nicoboss opened 9 years ago

nicoboss commented 9 years ago

magnet:?xt=urn:btih:ABC02B4AD7CF538BA8EC3B548BDA42DCD124D209 http://torcache.net/torrent/2770C21E9A3214AE03DF8514FCCDD36B66B87532.torrent https://mega.co.nz/#!VolxVBKL!2Hhd9GunnYvtzvitT87TF7NiXFnecXnd7q3fs_MlMwQ

A nearly complete Database dump of the site https://kickass.so/ for the Open Bay Project (https://github.com/isohuntto/openbay). The schema is in the first line of the csv file. I’ll probably update this DB every few months and also release other Torrent DB dumps and the raw data of this Dump ASAP. I'll also make one Big DB with Kickass + PirateBay without duplicates. Please don’t forget to seed! I've worked over 100h to make this DB dump. If you are interested how I get this date lock at my GitHub repository https://github.com/nicoboss/KickassCopy. I’m always open for new ideas.

ghost commented 9 years ago

can u please provide the exact method used to import this csv to mysql ? thank you !

nicoboss commented 9 years ago

I hope that the following steps will help you:

  1. Run the Open Bay setup and use your own DB
  2. Create a new Database
  3. Create a new table with my schema: name varchar(1023), size_bytes bigint(20), hash varchar(40), files_count int(11), author varchar(255), year smallint(6), category varchar(255), subcategory varchar(255), seeders int(11), leechers int(11)
  4. Upload your file to your /mysql/data/DB_Name/
  5. Go to the SQL Promt and enter the following code: LOAD DATA INFILE 'Kickass_DB_Dump.csv' INTO TABLE DB_Name.Tablename FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'
  6. Wait 5 to 15 minutes depending on your server speed.
  7. Now go to your new table and bring it in the correct format or edit the source code of the open bay project to accept this DB schema. You have to delete some columes and create new empty one. At the end you need to have the schema that is in /src/protected/data/schema.mysql.sql. If you don't have ignored the first row with contains the schema in the import script then you have to delete now.

name=name description=\N category_id=make this as you like. Theres no 100% correct solution. Make this with UPDATE DB_Name.Tablename SET category_id=5 WHERE tags='movies'; or something like this. size=size hash=hash files_count=files_count created_at=year converted to date (I don't know how to do this) torrent_status=\N visible_status=\N downloads_count=0 scrape_date=\N seeders=seeders

The other columns are useless at this time for the Open Bay project but maybe you can use this in the future.

  1. Overwrite the Open Bay DB with your DB and hope that everything works fine. For me not everything works perfect. That's the reason why I don't publish my version at this time.

I'll probably release a working sql file in the next days or weeks if nobody else will do this for me :D. Pleas public your version if you get it to work perfect. Sorry I'm more the C++ type and don't be very god with MySQL and php.

ghost commented 9 years ago

all i get is:

1366 - Incorrect integer value: 'size_bytes' for column 'size_bytes' at row 1

when executing: LOAD DATA INFILE 'Kickass_DB_Dump.csv' INTO TABLE ekoice.torrents FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'

nicoboss commented 9 years ago

OK, your problem is that your MySQL Server don't accept to import the schema row into the first row of your DB. So you have to ignore the schema row in the import script. The following script may help:

LOAD DATA INFILE 'Kickass_DB_Dump.csv' INTO TABLE ekoice.torrents FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES

nicoboss commented 9 years ago

I've decided to upload a preconverted SQL file ASAP.

If you can't wait then the following lines are to convert the category to catergory_id by your own. If your maximal script execute duration to unlimited (http://www.deuxcode.com/articles/091/how-to-prevent-script-timeout-in-phpmyadmin) because this will take over 30min. An alternative is to execute every line in the script by his own.

UPDATE kickass.open_bay tags=category; UPDATE kickass.open_bay SET description='' WHERE description IS NULL; UPDATE kickass.open_bay SET category_id=1 WHERE tags='Anime'; UPDATE kickass.open_bay SET category_id=2 WHERE tags='Applications'; UPDATE kickass.open_bay SET category_id=3 WHERE tags='Games'; UPDATE kickass.open_bay SET category_id=4 WHERE tags='XXX'; UPDATE kickass.open_bay SET category_id=5 WHERE tags='Movies'; UPDATE kickass.open_bay SET category_id=7 WHERE tags='Other'; UPDATE kickass.open_bay SET category_id=6 WHERE tags='Music'; UPDATE kickass.open_bay SET category_id=8 WHERE tags='TV'; UPDATE kickass.open_bay SET category_id=9 WHERE tags='Books';

ghost commented 9 years ago

thank you nicoboss, Im impatient to test your DB dump :)

nicoboss commented 9 years ago

http://torcache.net/torrent/39E10106F2F9EBFC70747949B72D2C7853F89073.torrent https://mega.co.nz/#!A0F1RaIJ!wLhkES-a1ny8sT1Bdax4ZA44tH6KBD-lvJVRb1s7kM4

Here the converted SQL file. :D That was other 7h hard work but now my DB is much easier to use. And I don’t work such a long time for a useless DB!

And the reason that the SQL DB have less rows than the CSV one is that due to a Bug in my program there are some duplications in the CSV file and now I removed this with “ALTER TABLE open_bay ADD id INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;”.

How to use: First you have to run the Setup of the Open Bay Project and use an own DB. Now import my SQL file into your MySQL Server and add or replace it with the automatic created DB. Then rename the wrong size_bytes colume to size.

Known Issue:

Next steps I do on this project is to release a DB that contains every magnet link from the Open Bay DB and my DB without any duplications. After that I’ll maybe dump some other torrent DB or update this one.

This DB contains over 5 000 000 Torent links! Please seed and distribute my project as much as you can to keep the internet as a place of liberty!

first_working_kickass_sql_dump_with_size

ghost commented 9 years ago

thanks, gonna try now. (seeding with 1gbit). will let you know!

nicoboss commented 9 years ago

http://torcache.net/torrent/ABC02B4AD7CF538BA8EC3B548BDA42DCD124D209.torrent https://drive.google.com/file/d/0B13QNh6ZKU3TaVZmNkNqY1BQbjg/view?usp=sharing https://mega.co.nz/#!glNwWKxT!pRhnX_kUqZD-rQQcEoTFKwhlwMlOSAetGjGCR-8bkiQ

And here the second Version of the SQL Database. I've corrected booth mistakes in the DB. Now everything should work perfect without doing anything on the DB. :D And please use the torrent Link if you can. I've only 10GB bandwidth on mega.co.nz. Please seed. @ekoice I like that you are so interested in my DB and don't give up and try everything to get it works. And I don't know how you get so much upload speed. I've only 50 MiB download 5 MiB/s upload and 50MiB download speed. I'm looking forward to your feedback.

How to use: First you have to run the setup of the Open Bay Project and use an own DB and Sphinx server. Now import my SQL file into your MySQL Server and or replace the automatic created DB with this version. To do this first delete the automatic created torrents DB that click on my DB go to operations and move or copy it to DB_name.torrents. This will take 5 to 10min.

ghost commented 9 years ago

the speed is cuz of my seedbox, anyway will update you soon with the results dear :+1: good job !

the only thing that bothers me so much is HOW to get a daily dump for example from kickass and transform it in a mysql dump which works with the actual db, so would be easier to update it daily with new content. I got a lot of free time, so please would be you so kind to let me know how you do it? with a tutorial or .... ? thank you in advance!

nicoboss commented 9 years ago

On kickass, you will be lucky with daily updates but on other torrent sites, it is nearly impossible. The cool thing on kickass is that if you have one full dump, and you have one, then the update thing isn’t very difficult and it would be is also possible to automate this process.

Manuel way:

  1. Go to my GitHub page (https://github.com/nicoboss/KickassCopy) and Download what you everything need. Every folder is one C++ project that contains one program. You will need the KickassHTML2MySQL folder and the kickass_new.txt file in the KickassLinkCreator folder.
  2. Download and install HTTrack Website Copier and open it.
  3. Create a new project and use kickass_new.txt al Link list. You don’t have to change anything in the properties for an update dump but if you plan a full dump then remove the Cache for Updates, Indexing, Try to find all URLS, URL hacks, analyzing Java files and other useless options that slow down the download. The maximal connections and maximal bitrate should you also change but if you set that to big your IP and Browser-ID get banned from the kickass server. In this case, hope that you have a dynamic IP-Address and wait until they change (maybe restarting router or wait one day will help) and change the Browser-ID in the option but the bannd change for a 1min update process on such a big site is nearly by 0. I get bannd when I downloaded with 40 Connections and 50000KiB/s for 2 h.
  4. Start the download and wait until it is finish, this will take you 1 to 5 minutes.
  5. Open the downloaded directory and copy the usearch folder into the kickass subfolder of my KickassHTML2MySQL project.
  6. Now open Code::Blocks or another C++ IDE, open the main.cpp file and run my program. You do not need to build my project or make any other useless things. I do not know haw god you understand C++ but the program was programmed in medium C++ so if you know some C++ basics you will maybe understand what it do but it’s not so important.
  7. An update dump is approximate 40 MB and contains 10 000 Torrent Links. A full kickass dump is over 50 GB in raw data and contains over 12 000 000 Links and over 7 000 000 duplicates. My program have around 2 seconds for 1000 Links. So it will have your update csv file in 20s
  8. Now you have to import that CSV file into your MySQL server and have to do manually every stupid conversation into the Open Bay DB schema. I think I have wrote enough about this boring step.
  9. Add the DB to the old DB. Ask google if you don’t know how to do this
  10. In the last step you have to check jour whole DB for duplications. DO this with “ALTER TABLE open_bay ADD id INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;”. Now you are done.

And now my idea for an automatic way: Make the HTTrack thing in the same C++ Program with the conversation from HTMP to CSV by using the HTTrack command lines arguments (https://www.httrack.com/html/fcguide.html). Sorry wget, curl and other website tools won’t work. You only get a decrypted file. The kickass team made good work to protect his site. For the Boring CSV to Open Bay DB it will be possible to write a PHP script or change my program to get the CSV output file in the correct format. If I have time, I will maybe try to automatize this update step. Now I make every 2 days a manual HTTrak dump and save this for later when I do the other steps all by one. But first I’ll upload an all in one DB that contains every torrents of both databases.

ghost commented 9 years ago

thank you for the explanation, would take me days to apply what you wrote here !!! :+1: wouldnt be easier to use these : http://kickass.so/api/ ??? let me know :)

nicoboss commented 9 years ago

Unfortionately the official kickass api is very bad and the schema is more chaotic then the CSV output of my program. But the moust important reason not to use it is that the only useful api data are name, hash and category. That are to few to make a useful DB. File size, number of files, seeders, leechers, upload date, author and subcategory aren't in the api file. And for me it would take approximately the same time to get a full automatic DB updating program. For both possibilities I would have 7 to 18 hours but that one with HTTrack is much easier and gives more informations. I will probably make such a program after uploading the all in one DB. I'm very interessted in this.

Here an example of a Torrent link downloaded with the kickass api. As you can see there are mouch missing informations: 6A685B76D053015EEB2B59CAFF5D45183D8034DA|Hegre-Art14 12 16 Rosie Eye Of The Tiger 1080P MP4-GUSH|XXX|https://kickass.so/hegre-art14-12-16-rosie-eye-of-the-tiger-1080p-mp4-gush-t10013112.html|http://torcache.net/torrent/6A685B76D053015EEB2B59CAFF5D45183D8034DA.torrent

viniciushsantana commented 9 years ago

Great work, congrats.

TheChiefCoC commented 9 years ago

to help @nicoboss Direct Download Link for Kickass_SQL_DB_Dump_December_2014_V2.7z : https://ifilescloud.com/7f4711839b7e8d4f

ghost commented 9 years ago

a real direct link its here : http://goo.gl/ffCehv no restrictions. thanks nicoboss.

ghost commented 9 years ago

hey @nicoboss, I've tested your method to get an updated dump from kickass, and i think i got one, but the problem is that Im not able to trasform it from csv to sql, could you Please help me out ? here is the direct link of the csv dump: http://goo.gl/mY8al4 If you please help or at least explain how could I (we) do trasform it into sql, would be super great !

Im asking for your help cuz I've tried myself to import it with: LOAD DATA INFILE 'Kickass_DB.csv' INTO TABLE ekoice.torrents FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES; but i get: ERROR 1062 (23000): Duplicate entry '2014' for key 'hash'

Im using the original mysql schema: CREATE TABLE IF NOT EXISTS torrents ( id int(11) unsigned NOT NULL AUTO_INCREMENT, name varchar(255) DEFAULT NULL, description text, category_id tinyint(4) DEFAULT NULL, size bigint(20) unsigned DEFAULT NULL, hash varchar(40) NOT NULL, files_count int(11) DEFAULT '0', created_at datetime DEFAULT NULL, torrent_status smallint(2) DEFAULT '0', visible_status smallint(2) DEFAULT '0', downloads_count mediumint(8) unsigned NOT NULL DEFAULT '0' COMMENT 'umax = 16777215', scrape_date datetime DEFAULT NULL, seeders mediumint(8) unsigned NOT NULL DEFAULT '0', leechers mediumint(8) unsigned NOT NULL DEFAULT '0', tags varchar(500) DEFAULT NULL, updated_at datetime DEFAULT NULL, PRIMARY KEY (id), UNIQUE KEY hash (hash), KEY created_at (created_at), KEY size (size), KEY seeders (seeders), KEY category_id_torrent_status_visible_status (category_id,torrent_status,visible_status) );

Let me know please. All the best.

EDIT: Download Link fixed.

nicoboss commented 9 years ago

Yes I know that error. In normal cases use the IGNORE INTO statement if you get this: LOAD DATA INFILE 'Kickass_DB.csv' IGNORE INTO TABLE ekoice.torrents FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES;

But the reason why you have duplicates is that you have an id column but the csv schema not (name|size[Bytes]|hash|files_count|author|year|category|subcategory|seeders|leechers) so you have to remove the id for the import process or mouch better use:

LOAD DATA INFILE 'Kickass_DB.csv' INTO TABLE ekoice.torrents FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES (name,size,hash,files_count,author,year,category,subcategory,seeders,leechers);

PS: I've released today the TorrentProject.se DB dump for the Open Bay Project that is based on the official dailydump api. You can also use this on the kickass site with very few changes. Maybe I'll probably release a kickass version of this program. It's easier to use this but you'll get less information.

KatStaff commented 9 years ago

We have updated our daily dumps to include: torrent_info_hash|torrent_name|torrent_category|torrent_info_url|torrent_download_url|size|category_id|files_count|seeders|leechers|upload_date Update will include these changes shortly. Please use our db api: https://kickass.to/api/

nicoboss commented 9 years ago

Thanks you very much for improving your full and daily dumps. I love you for doing that. I never thought that you’ll upload all this information’s for free. You probably have the best daily dumps of every torrent site. It’s very cool. This will save us a lot of time and bandwidth and it’s much easier than our previous way. Their full dump contains 6’451’965 torrents. That are much more then in my dump. In addition, we can use hourly dump for updating our DB. It’s much more resource efficient then the RSS way and also works on home PCs that aren’t 24/7 ON but you only have one update per hour but I think that’s more then enough. I will upload a SQL file in Openbay format of the latest full dump and a description, how to convert it by your own, soon. Please do no longer use the old way because it needs over 100 times more bandwidth and much more time then the new one. BTW Kickass Torrents is my favorite torrent site. A big thank to the Kickass Team for hosting such a good site.

sacko12 commented 7 years ago

please you can give me kickass script Or other script torrent bacar.sacko@yahoo.fr

cryptid11 commented 7 years ago

Is there any way to get the last official kickass dump?

Hiseeston commented 7 years ago

I added all my dumps to https://github.com/Hiseeston/TorrentDumps but the latest full API dump from 2015-06-30 but I have an OpenBay dump that can easily be converted without any important information loose that was made only some hours until their site get down. I'll add more dumps of otter torrent sites soon.

cryptid11 commented 7 years ago

pretty old the kat db - just 2 months newer than https://web.archive.org/web/20150416071329/http://kickass.to/api - I'm sure there is a newest version out there!!!