xbenjii / torrentexpander

Automatically exported from code.google.com/p/torrentexpander
0 stars 0 forks source link

Awesome imdb integration! #20

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
It is possible to grab some film information from imdb!
How? Let me explain...

IMDB WEB SCRAPER (Open Source)
http://lab.abhinayrathore.com/imdb/imdbWebService.php?m=Titanic&o=xml

EXAMPLE CODE
# Grab IMDB information table
# $ FILM is 'FILMNAME (YEAR)' 
wget -o imdb.xml 
http://lab.abhinayrathore.com/imdb/imdbWebService.php?m=$FILM&o=xml

# Grab Film Title (Awesome for PCH!!)
# Many languages avayable!
# Popcorn users will LOVE this! nmt jukebox requires english title only!!
# With this scraper you could search using ex. french/italian/spanish film 
name, and you can get english film name, required by Jukebox!

TITLE=`xpath -q -e '//ALSO_KNOWN_AS' imdb.php | grep Italy | sed 's/<[^>]*>//g' 
| sed 's/=.*//g' | sed 's/&amp;#x27;/'"'"'/g'`;

# Grab Film year
YEAR=`xpath -q -e '//YEAR' imdb.php | sed 's/<[^>]*>//g'`;

# Grab Film POSTER (Awesome!!)
# Many Poster Avayable!
POSTER=`xpath -q -e '//POSTER_SMALL' imdb.php | sed 's/<[^>]*>//g'`;
wget $POSTER

# Grab Rating (Awesome!!)
RATING=`xpath -q -e '//RATING' imdb.php | sed 's/<[^>]*>//g'`;

# Grab imdb title (tt0120338)
# Useful for Popcorn Hour!!!
IMDB=`xpath -q -e '//TITLE_ID' imdb.php | sed 's/<[^>]*>//g'`;

Original issue reported on code.google.com by login...@gmail.com on 14 Nov 2011 at 12:54

GoogleCodeExporter commented 8 years ago
THE FIRST IDEA: build .nfo files for Popcorn Hour NMT jukebox.

Why? 
NMT jukebox requires english film title only to get movie data from internet.
If you live outside the USA, then you won't able to use Jukebox until you 
manually rename films with their english title.

How to solve the problem?
The best way to solve the problem is to place an .nfo file near film the 
contains correct imdb url. In this way, no rename is required. More info at: " 
http://www.networkedmediatank.com/showthread.php?tid=46095 "

Example:
News Movie (2008).mkv // Italian title of 'The Onion Movie'
News Movie (2008).nfo // It contains correct imdb url: 
http://www.imdb.com/title/tt0392878/

How to do this?
Simple!

$ FILM is News Movie (2008)

wget -o imdb.xml 
http://lab.abhinayrathore.com/imdb/imdbWebService.php?m=$FILM&o=xml&callback=%3F
&submit=Call

IMDB_URL=`xpath -q -e '//IMDB_URL' imdb.php | sed 's/<[^>]*>//g'`;

echo "$IMDB_URL" >> $FILM.nfo

Original comment by login...@gmail.com on 15 Nov 2011 at 12:53

GoogleCodeExporter commented 8 years ago
Quick & (Absolutely) Dirty implementation
Needs nfo destination folder at line 610.

Original comment by login...@gmail.com on 15 Nov 2011 at 8:14

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks for your input and really sorry for not getting back to you earlier.
I'll definitely look into it next week-end.
I'll need to look into wget and make sure there is a timeout that can be 
configured.
I'll also need to make sure wget is installed and input its path.
Also one thing I'll have to look into is that 
http://labaia.hellospace.net/imdbWebService.php website. It's amazing what it 
is able to spit out.
I'll also need to make sure the rest of the script won't choke on those nfo 
files.

Thanks for your help

Original comment by addicted...@gmail.com on 15 Nov 2011 at 10:46

GoogleCodeExporter commented 8 years ago
Scraper file n.1

Original comment by login...@gmail.com on 16 Nov 2011 at 5:10

Attachments:

GoogleCodeExporter commented 8 years ago
Scraper file n.2

Original comment by login...@gmail.com on 16 Nov 2011 at 5:10

Attachments:

GoogleCodeExporter commented 8 years ago
New version of modded scripts.
Added Poster download, some minor update.
I don't know if PCH support xpath. I doubt.

Original comment by login...@gmail.com on 16 Nov 2011 at 7:49

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks
I browsed through all the script and here's what I plan on doing in terms of 
IMDB integration.
Let me know what you think of it.
I won't start coding until I'm sure I haven't forgotten anything... also I only 
have time to work on it during weekends...

- add the imdb options in the script parameter
- add this new parameter to the settings.ini file
- Detect wget path
- Save movie title / series title in a new variable (X) and file name after it 
has been renamed in another variable (Y) :
    -> If multiple video file with a movie pattern / series pattern in the surrounding folder, retain the name of the folder in its imdb perfect match format (remove season.* from the end of the name if it's a series pack)
    -> If only one video file with a movie pattern, retain the name of the file without its extension in its perfect match format
- Make sure http://labaia.hellospace.net/imdbWebService.php is up and running
- Add the lines you kindly supplied in a new subroutine right before the 
"Convert DTS track..." routine. If lookup fails, the script has to be able to 
recover.
    Also :
    -> Title stored in variable X will be fetched for NFO and JPG
    -> Add nfo and jpg extensions to the movies_extensions_rev variable
    -> If single file movie, and and NFO/JPG files downloaded (count files) name those NFO and JPG files as variable_Y.extension and :
        - put all these files (movie included) in a new folder named variable_Y
        - rewrite $log_files with those files and the new folder
    -> If multi files movie / series pack, fetch variable X for NFO / JPG and store / name them as the files included in the surrounding folder (there will then be movie_part_1.nfo, movie_part_1.jpg, movie_part_2.nfo, movie_part_2.jpg + movie_part_1.avi and movie_part_2.avi in this folder... maybe much more for a series pack) and list all that in the $log_files file.

And of course credit you in the script to thank you for your help :-)

Thanks again for your interest in torrentexpander

Original comment by addicted...@gmail.com on 16 Nov 2011 at 10:30

GoogleCodeExporter commented 8 years ago
1) Yes, add imdb options and settings.
I suggest "produce_imdb_nfo", "download film poster", "poster_format". The 
"poster_format" variables can be 'normal, large, small, full'.
2) Of course.
3) Ok, but .. what about xpath, are you able to replace its function? ;-)
4) Example is required. ( ?? )
5) Of course.
6) I think the most important thing is "let users choose what to do".

If an user want to rename files using 'type_3' rules, but it want to produce 
.nfo and .jpg of the movie, then he can do it, even 'type_1' rules is necessary 
to get correct imdb information. 

The imdb implementation must be separate from renaming script, but the film 
name must match the .nfo & .jpg filename. Users may decide to not rename files, 
but he may want to use imdb.

Original comment by login...@gmail.com on 17 Nov 2011 at 6:05

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
This is my concept map.
Please have a look. I'm not sure if all is correct and meets torrentexpander 
futures. 
Feel free to edit the concept map, we could use for documentation.
Thanks to program best automatic rename tool of the world.

Original comment by login...@gmail.com on 18 Nov 2011 at 1:11

Attachments:

GoogleCodeExporter commented 8 years ago
Hi
Thank you for all the time you spent improving torrentexpander.
I finally took time to give your imdb integration routine a try.
wget is not always installed by default (for example on Mac OS X), so I tried 
curl instead.
Depending on which one is installed, I'll automatically switch to the right one.
I kinda improved some lines by storing the xml in a variable and dropped xpath 
dependencies.

I'll spend time on imdb integration and your concept map this week-end.
Thanks for your help.

PS: I'm no programer and I started writing my first lines of code when I 
started torrentexpander not so long ago, so it's nice to know you like it.

Take a look at the rewriting :

    # IMDB integration
        nfo_file=`echo "$title_clean_ter_other_pat".nfo`;
        poster=`echo "$title_clean_ter_other_pat".jpg`;
        xml_cont="$(curl -i "http://labaia.hellospace.net/imdbWebService.php?m=$title_clean_ter_other_pat&o=xml")"
        wait
        imdb_url=`echo "$(echo $xml_cont | egrep -o "<IMDB_URL>.*</IMDB_URL>" | sed -e 's;\(<IMDB_URL>\)\(.*\)\(</IMDB_URL>\);\2;')"`;
        poster_url=`echo "$(echo $xml_cont | egrep -o "<POSTER>.*</POSTER>" | sed -e 's;\(<POSTER>\)\(.*\)\(</POSTER>\);\2;')"`;
        if [ "$imdb_url" != "" ]; then
            step_number=$(( $step_number + 1 ))
            echo "Step $step_number : Building .nfo";
            echo "$imdb_url" > "$destination_folder/$nfo_file";
            fi
        if [ "$poster_url" != "" ]; then
            step_number=$(( $step_number + 1 ))
            echo "Step $step_number : Downloading Poster";
            curl -o "$destination_folder/$poster" "$poster_url";
            # wget -q -O "$destination_folder/$poster" $poster_url;
            wait
        fi

Original comment by addicted...@gmail.com on 18 Nov 2011 at 11:08

GoogleCodeExporter commented 8 years ago
Excellent!

I'm studing a way to get fanart images using this:
http://api.themoviedb.org/2.1/methods/Movie.getImages

Preparing for this future, i suggest to grab TITLE_ID.
title_id=`echo "$(echo $xml_cont | egrep -o "<TITLE_ID>.*</TITLE_ID>" | sed -e 
's;\(<TITLE_ID>\)\(.*\)\(</TITLE_ID>\);\2;')"`;

Original comment by login...@gmail.com on 19 Nov 2011 at 1:18

GoogleCodeExporter commented 8 years ago
Very simple!

fanart=`echo "$title_clean_ter_other_pat".fanart.jpg`;
wget/curl 
http://api.themoviedb.org/2.1/Movie.getImages/en/xml/57983e31fb435df4df77afb8547
40ea9/$title_id

then grab the url of random backdrop imgage in size $fanart_size // user choose 
depending tv

wget -q -O "$destination_folder/$fanart" $fanart_url;

Original comment by login...@gmail.com on 19 Nov 2011 at 1:32

GoogleCodeExporter commented 8 years ago
Hi Loginbug
I just created a Torrentexpander 101 wiki page to help you understand the basic 
structure of torrentexpander
http://code.google.com/p/torrentexpander/wiki/Torrentexpander_in_depth?ts=132174
2965&updated=Torrentexpander_in_depth
Your idea of maintaining a concept map is great, but due to the length of the 
script, we'll need to use a modeling software.
Torrentexpander is only 800 lines long but it is already fairly complex. I only 
started this project a few months ago and I am already losing track of what 
line does what and why it does it.
Right now, I'm reviewing the whole script in order to refresh my memory and be 
more efficient while adding the imdb functionality.

Original comment by addicted...@gmail.com on 19 Nov 2011 at 10:57

GoogleCodeExporter commented 8 years ago
Thanks, i will read.
Anyway, i found a bug in your imdb script, on command curl. That's the correct 
way:

# curl function dislike spacing; replace spaces with +
title_clean_ter_other_pat_nospace=`echo $title_clean_ter_other_pat | sed 's/\ 
/\+/g'`;
xml_cont="$(curl -i 
"http://labaia.hellospace.net/imdbWebService.php?m=$title_clean_ter_other_pat_no
space&o=xml")"

Original comment by login...@gmail.com on 20 Nov 2011 at 1:27

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
The last, working version of modded script.
NFO + POSTER + FANART Avaiable
I prefer to use grep commmand insted of egrep ad xml files insted of varibles

Original comment by login...@gmail.com on 20 Nov 2011 at 7:12

Attachments:

GoogleCodeExporter commented 8 years ago
Check out SVR release r81
IMDB is now integrated
I still have issue with curl not setting mime type for images
Also, I commented out fanart lines because I haven't had enough time to make it 
work

You need to enable this at the beginning of the script or in your settings.ini 
file:
imdb_poster="yes"
imdb_poster_format="normal"
imdb_nfo="yes"
imdb_fanart="yes"
imdb_fanart_format="w1280"

Original comment by addicted...@gmail.com on 20 Nov 2011 at 11:05

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Good, but imdb plugin should work even if 'clean_filename'=no

Original comment by login...@gmail.com on 22 Nov 2011 at 12:20

GoogleCodeExporter commented 8 years ago
I have seen that script are not able to rename files ( ...CD1.avi & ...CD2.avi 
) inside a folder (renamed correctly).

Original comment by login...@gmail.com on 22 Nov 2011 at 12:55

GoogleCodeExporter commented 8 years ago
Regarding comment 20 : SVN release r83 doesn't require clean_filename to be 
turned on for IMDB routine to work.
Regarding comment 21 : long ago, I decided not to rename files if several files 
are found in a torrent.

There are too many patterns (CD1/CD2, moviea/movieb, movie1/movie2, 
moviepart1/moviepart2, and so on)
Also, what happens if the torrent contains TV Episodes, Subtitles (especially 
idx/sub)...
Renaming files from a multi files torrent would be really likely to fuck up, 
trust me on that ;-)

Once I'm done adding fanarts and making sure no nfo/jpg is generated for non 
movie files (set, idx, sub subtitles), I'll ask you to test it thoroughly and 
confirm me it works fine - for now everything seems OK.

Thanks again

Original comment by addicted...@gmail.com on 22 Nov 2011 at 10:43

GoogleCodeExporter commented 8 years ago
Thanks.
Yes, i trust you.

Original comment by login...@gmail.com on 23 Nov 2011 at 10:56

GoogleCodeExporter commented 8 years ago
If destination directory already exist, program stop itself: 'destination 
folder is not empty'
I think the program should continue, putting the files inside it (only if 
filename is NOT the same).

Example
Suppose that you have download two version of the same Film

1) First version 
It's a folder named /Avatar.2009.Xvid-MYDAD/
--> Avatar.2009.Xvid.CD1-MYDAD.avi
--> Avatar.2009.Xvid.CD2-MYDAD.avi

2) Second version 
It's a folder named /Avatar.2009.Xvid-MYMUM/
--> Avatar.2009.Xvid.CD1-MYMUM.avi
--> Avatar.2009.Xvid.CD2-MYMUM.avi

After run torrentexpender for both files with 'type_1' schema, i should get:
Folder: Avatar (2009)
--> Avatar.2009.Xvid.CD1-MYDAD.avi
--> Avatar.2009.Xvid.CD1-MYDAD.nfo
--> Avatar.2009.Xvid.CD1-MYDAD.jpg
--> Avatar.2009.Xvid.CD2-MYDAD.avi
--> Avatar.2009.Xvid.CD2-MYDAD.nfo
--> Avatar.2009.Xvid.CD2-MYDAD.jpg
--> Avatar.2009.Xvid.CD1-MYMUM.avi
--> Avatar.2009.Xvid.CD1-MYMUM.nfo
--> Avatar.2009.Xvid.CD1-MYMUM.jpg
--> Avatar.2009.Xvid.CD2-MYMUM.avi
--> Avatar.2009.Xvid.CD2-MYMUM.nfo
--> Avatar.2009.Xvid.CD2-MYMUM.jpg

I think this is a good job. It's ordered.

If destination file already exist, damn! Is it possible to rename folder only?
if /Avatar (2009)/ exists then new folder could be /Avatar (2009) [1]/

Original comment by login...@gmail.com on 23 Nov 2011 at 11:40

GoogleCodeExporter commented 8 years ago
It is necessary to add some code to avoid creation of empty file.

if [ "$imdb_poster" == "yes" && "$poster_url != "" ]; then "$wget_curl" -q 
"$poster_url" -O "$temp_folder_without_slash/temp_poster"; wait; fi

I suggesto to you to use xml files insted of xml variables for debuggin reason.
It will very nice if torrentexpander had --debug option that (for ex. debug 
mode mantain imdb.xml and themoviedb.xml files)

Original comment by login...@gmail.com on 23 Nov 2011 at 6:13

GoogleCodeExporter commented 8 years ago
Torrentepander better IMDB TMDB plugin
+ Do no create empty file
+ Debug support ( I really need )
- Only wget at moment

Original comment by login...@gmail.com on 25 Nov 2011 at 10:58

Attachments:

GoogleCodeExporter commented 8 years ago
Thank you for this
I made some minor changes to your code and included it to the last SVN
I'm sticking with variables instead of xml files, but added some more 
information to the debug log.
Also, I improved the rename routine so that determining the IMDB title works 
faster.

I couldn't get fanart to work. xml looks like that :
1 3 true false en Adaptation. Adaptation. The Orchid Thief movie 2757 tt0268126 
http://www.themoviedb.org/movie/2757 Charlie Kaufman (Cage) writes the way he 
lives, with great difficulty. His twin brother Donald (also Cage) lives the way 
he writes, with foolish abandon. Susan (Streep) writes about life, but can't 
live it. John's (Cooper) life is a book, waiting to be adapted. One story. Four 
lives. A million ways it can end. 19 8.0 R 2002-06-12 114 1228 2011-11-26 
14:57:49 UTC

This is what it is supposed to look like :
http://api.themoviedb.org/2.1/methods/Movie.imdbLookup

I'm 

Original comment by addicted...@gmail.com on 26 Nov 2011 at 7:54

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
The line that's already in the script should work and doesn't rely on new 
commands like tr
The problem is that none of the xml downloaded from tmbd contains any 
backdrop... while the the movies in question obviously have backdrops.

XML looks like that :
1 3 true false en Adaptation. Adaptation. The Orchid Thief movie 2757 tt0268126 
http://www.themoviedb.org/movie/2757 Charlie Kaufman (Cage) writes the way he 
lives, with great difficulty. His twin brother Donald (also Cage) lives the way 
he writes, with foolish abandon. Susan (Streep) writes about life, but can't 
live it. John's (Cooper) life is a book, waiting to be adapted. One story. Four 
lives. A million ways it can end. 19 8.0 R 2002-06-12 114 1228 2011-11-26 
14:57:49 UTC

On the website, there are about 10 backdrops :
http://www.themoviedb.org/movie/2757-adaptation

Original comment by addicted...@gmail.com on 27 Nov 2011 at 2:38

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
This is my xml file downloaded with sample script.

Original comment by login...@gmail.com on 27 Nov 2011 at 3:02

Attachments:

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
This sample script works well for me!

Original comment by login...@gmail.com on 27 Nov 2011 at 3:25

Attachments:

GoogleCodeExporter commented 8 years ago
Thank you
Fanart now works in latest SVN build
TMDB servers must have fucked up yesterday evening

Original comment by addicted...@gmail.com on 27 Nov 2011 at 6:26

GoogleCodeExporter commented 8 years ago
Switched it to enhancement

Original comment by addicted...@gmail.com on 27 Nov 2011 at 6:27

GoogleCodeExporter commented 8 years ago

Original comment by login...@gmail.com on 4 Dec 2011 at 10:47