YTS IMDB LINK - Githubissues

makkoncept / yts.am_scraper_

Python script to scrape Yify movies torrents info. Also see https://github.com/makkoncept/yts_torrents

14 stars 8 forks source link

YTS IMDB LINK #1

Open melbasuonyy opened 6 years ago

melbasuonyy commented 6 years ago

Hi It's a great tool and helped a lot but if you can add IMDB URL so it will be easy to compare the data if you have huge library

Thanks

makkoncept commented 6 years ago

hey @melbasuonyy , thank you for appreciating the project. Can you elaborate more on the feature. You want the imdb link to be one of the data scraped and added to the csv?

melbasuonyy commented 6 years ago

exactly, it will be easy to track the movies as IMDB has it's unique number for each movie tt*****

makkoncept commented 6 years ago

@melbasuonyy i have added the feature, you can run the script and have a look.

melbasuonyy commented 6 years ago

I tried the link but it not fetching links for movies which include letters like ( & , : ) if you can fix it it will be great

makkoncept commented 6 years ago

@melbasuonyy It is working on my machine ! Some columns are left blank as there is no link on the site in the first place.

test_run

TroubleShooting

Have you cloned the updated version of the script?
Maybe the site is blocked at your place, I use a proxy to get around this.
You can test run the script at few pages initially, In the above example i have ran it on on the first page only. You can do this by changing the range in for page in range(1, 357): in the code.

If the problem is still not solved, let me know.

melbasuonyy commented 6 years ago

yes it's same on my machine as well the links are grabbed for the same movies as well i think movies with special characters are showing as blank (check raw 3 / 15 / 21) but it's manageable thanks for your work

I have one more suggestion you can look into it if it will help you can also include the download links for the torrents on the file with each movie it will be easy for people who wants to download batch files one time

makkoncept commented 6 years ago

Yeah , there are many edge cases and i think searching and writing the condition for all of them would make the code messy. Also after a month I came to know that yts has an API :p .

I previously taught about it but as you can see the links are not uniform on the site. I think adding only 720p and 1080p rows(as 3D is not available for majority of films) would be good and if any torrent is not available, leave the cell blank.

makkoncept commented 6 years ago

@melbasuonyy , I have made some enhancement to the script and it can now handle special characters in the movie name. I have not come across an exception but it can exist if the movie has a very bizarre name. Let me know if you found one.

final

comparing this with the above output, movie names with special characters are also scraped here.

melbasuonyy commented 6 years ago

I have done the full site and only 95 movies are not scraped which is great so overall it scraped all the links expect 2 %

Thanks alot

makkoncept commented 6 years ago

@melbasuonyy You can scrape the whole website by updating the range in for page in range(1, 357): to for page in range(1,383): in the script as now new torrents are added and the total number of pages are 382 .

page

Welcome :)