xbenjii / torrentexpander

Automatically exported from code.google.com/p/torrentexpander
0 stars 0 forks source link

Missing Tag #36

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
As you know i can see users search on scraper. I will use this topic to report 
missing tag:

Extended

Original issue reported on code.google.com by login...@gmail.com on 12 Dec 2011 at 10:59

GoogleCodeExporter commented 8 years ago
Extended has been added
Keep 'em coming :-)

Can I get access to searches on parser ? I'd be very interested in knowing how 
many searches are done on a daily basis.

BTW, I gave you access to Google Analytics for this project. As you may notice, 
traffic has increased significatively in the past few days :-)
https://www.google.com/analytics/web/

Original comment by addicted...@gmail.com on 12 Dec 2011 at 10:40

GoogleCodeExporter commented 8 years ago
Ehm...
Currently, (excluding me and you) there are no people that are using our parser.
Unfortunately only last CSI version works well, older version had a bug that 
induces users reinstalling transmission without torrentexpander.

I could automatically send to you log by email.

Original comment by login...@gmail.com on 13 Dec 2011 at 10:59

GoogleCodeExporter commented 8 years ago
The least I can say is that this is disappointing.
Let's keep up the good work and hope people will start using it and enjoy it.

I'll be interested in the logs when the parser starts being used by many 
people. It will really help improve the script.

Original comment by addicted...@gmail.com on 13 Dec 2011 at 8:32

GoogleCodeExporter commented 8 years ago
Users are not a problem. Issue 37

Original comment by login...@gmail.com on 14 Dec 2011 at 5:16

GoogleCodeExporter commented 8 years ago
Renaming not-all good:
Mai.Dire.Grande.Fratello.12.E07.iTALiAN.PDTV.XviD-EXiT.avi
Mai Dire Grande Fratello 12 E07_.avi

Original comment by login...@gmail.com on 14 Dec 2011 at 8:15

GoogleCodeExporter commented 8 years ago
Another case, found on scraper log:
Cavendish+Documentary+From+Sky+Sports_

Original comment by login...@gmail.com on 14 Dec 2011 at 8:16

GoogleCodeExporter commented 8 years ago
Missing tags: 'Theatrical Cut'

Original comment by login...@gmail.com on 15 Dec 2011 at 1:09

GoogleCodeExporter commented 8 years ago
Last build solves all of this
I'm so glad I'm using variables and regexp conversion to manage patterns :-)

Original comment by addicted...@gmail.com on 16 Dec 2011 at 10:08

GoogleCodeExporter commented 8 years ago
Sorry if I confused,
When the user search Cavendish Documentary From Sky Sports_, log looks like 
Cavendish+Documentary+From+Sky+Sports_

That filename does not contain plus

Original comment by login...@gmail.com on 17 Dec 2011 at 12:16

GoogleCodeExporter commented 8 years ago
Thanks for update. Switched to medium priority. I think the max priority now is 
Issue 13, useful to implement in CSI Issue 30.

Original comment by login...@gmail.com on 17 Dec 2011 at 12:31

GoogleCodeExporter commented 8 years ago
What do you think is the best approach ?
Considering "documentary" as a pattern that should be removed may be a bad 
idea. Many movies and documentaries include the word documentary in their title.

Original comment by addicted...@gmail.com on 17 Dec 2011 at 9:24

GoogleCodeExporter commented 8 years ago
i think documentary is not a pattern. 

Original comment by login...@gmail.com on 17 Dec 2011 at 11:23

GoogleCodeExporter commented 8 years ago
Now there are some active users. Missing tags, founded on logs:

Dvdscreener
Spanish
X264
Divx
Dvdriptorrents (we can divide it?)
Hdtvrip
Mvo (???)

Wrong renaming:
2004+The+Swan+Princess

Also, please tell me which line i have to add in crontab to send you logs. 
This is the main command: 
cat /var/log/apache2/access.log | grep imdb

Original comment by login...@gmail.com on 18 Dec 2011 at 4:16

GoogleCodeExporter commented 8 years ago
Hi
I never tried it and I can't test it right now, but once ssmtp is configured on 
your server, this line should do the trick
0 0 * * * echo -ne "$(cat /var/log/apache2/access.log | grep imdb)" | mail -v 
-s "Latest Torrentexpander Queries" mymailadress@gmail.com

Of course, mymailadress@gmail.com has to be replaced by my real mail address :-D
I'm really glad new people started using torrentexpander.

Original comment by addicted...@gmail.com on 18 Dec 2011 at 5:05

GoogleCodeExporter commented 8 years ago
Here is a better command.
We don't need the full logs to improve tags recognition.
0 0 * * * echo -ne "$(cat /var/log/apache2/access.log.1 | grep imdb | sed 
's;^.*/imdbWebService\.php?m\=\(.*\)\&o\=xml.*$;\1;g' | sed 's;%28;(;g' | sed 
's;%29;);g' | sed 's;\+; ;g')" | mail -v -s "Latest Torrentexpander Queries" 
mymailadress@gmail.com

Thanks

Original comment by addicted...@gmail.com on 20 Dec 2011 at 10:19

GoogleCodeExporter commented 8 years ago
unknown -v option
access.log.1 is the old log, switch to access.log ?
You should receive mail from now.

Original comment by login...@gmail.com on 21 Dec 2011 at 9:49

GoogleCodeExporter commented 8 years ago
Tryed this line but is not working, manually yes, automatic no.

0 0 * * * root echo -ne "$(cat /var/log/apache2/access.log | grep imdb | sed 
's;^.*/imdbWebService\.php?m\=\(.*\)\&o\=xml.*$;\1;g' | sed 's;%28;(;g' | sed 
's;%29;);g' | sed 's;\+; ;g')" | mail -s "Latest Torrentexpander Queries" 
addicteffefddffgdghggrghtgdghsfgsfg@gmail.com

/bin/sh: -c: line 0: unexpected EOF while looking for matching `''
/bin/sh: -c: line 1: syntax error: unexpected end of file

Original comment by login...@gmail.com on 27 Dec 2011 at 5:48

GoogleCodeExporter commented 8 years ago
Here's another line that should work in cron :
First check if bin path is correct :
which echo
which cat
which grep
which sed
which mail

59 23 * * * /bin/echo -ne "$(/bin/cat /var/log/apache2/access.log | /bin/grep 
imdb | /bin/sed 's;^.*/imdbWebService\.php?m\=\(.*\)\&o\=xml.*$;\1;g' | 
/bin/sed 's;%28;(;g' | /bin/sed 's;%29;);g' | /bin/sed 's;\+; ;g')" | 
/usr/bin/mail -s "Latest Torrentexpander Queries" 
addicteffefddffgdghggrghtgdghsfgsfg@gmail.com

Let me know how that works for you

Original comment by addicted...@gmail.com on 27 Dec 2011 at 8:54

GoogleCodeExporter commented 8 years ago
Manually works! I hope automatic also!

Some New Missing tags:
Dubbed
Collection
Screener
Remastered
Season
Nlsubs
Hd1080p

Original comment by login...@gmail.com on 29 Dec 2011 at 9:31

GoogleCodeExporter commented 8 years ago
Added these patterns to the current build
We'll see in a few hours if logs are now automatically sent

Original comment by addicted...@gmail.com on 29 Dec 2011 at 10:39

GoogleCodeExporter commented 8 years ago
I never received any croned log
Have you made sure mail is set up for root, as it seems that this is a root 
cron ?
Thanks

Original comment by addicted...@gmail.com on 4 Jan 2012 at 7:56

GoogleCodeExporter commented 8 years ago
I have made some changes, i hope now it works.

Original comment by login...@gmail.com on 4 Jan 2012 at 11:44

GoogleCodeExporter commented 8 years ago
Just received a report
Is it from your cron ?

I'll edit and send you a new line in order to remove duplicates and garbage

Thank you

Original comment by addicted...@gmail.com on 4 Jan 2012 at 12:50

GoogleCodeExporter commented 8 years ago
no, it's a manual test.

Original comment by login...@gmail.com on 4 Jan 2012 at 4:33

GoogleCodeExporter commented 8 years ago
In about 600 lines there aren't a lot of missing tags
I'll add a few missing patterns like "xxx" and "[. _-].*[. _-]subs"
I'll check the rest this week-end

I'm glad people are using torrentexpander now :-)

To get rid of all the garbage, the cron line should be
59 23 * * * /bin/echo -ne "$(/bin/cat /var/log/apache2/access.log | /bin/grep 
imdb | /bin/sed 's;^.*/imdbWebService\.php?m\=\(.*\)\&o\=xml.*$;\1;g' | 
/bin/sed 's;%28;(;g' | /bin/sed 's;%29;);g' | /bin/sed 's;\+; ;g' | 
/usr/bin/sed "s;%20; ;g" | /usr/bin/grep -v "[Ii]mdb[Ww]eb[Ss]ervice" | 
/usr/bin/sort | /usr/bin/uniq)" | /usr/bin/mail -s "Latest Torrentexpander 
Queries" addicteffefddffgdghggrghtgdghsfgsfg@gmail.com

Have a nice evening, thanks for all your work

Original comment by addicted...@gmail.com on 4 Jan 2012 at 8:09

GoogleCodeExporter commented 8 years ago
Sent new test mail

Original comment by login...@gmail.com on 4 Jan 2012 at 11:19

GoogleCodeExporter commented 8 years ago
Looks like we're not there yet
I forwarded you the e-mails

Original comment by addicted...@gmail.com on 4 Jan 2012 at 11:26

GoogleCodeExporter commented 8 years ago
What's wrong?

Original comment by login...@gmail.com on 4 Jan 2012 at 11:29

GoogleCodeExporter commented 8 years ago
First mail sent at 23:59 contained a few garbage lines
Second mail sent at 00:17 was empty

I'll try to replicate your setup and try in on my server this week-end

Original comment by addicted...@gmail.com on 5 Jan 2012 at 6:18

GoogleCodeExporter commented 8 years ago
Ok, but..what is a 'garbage' line?

Original comment by login...@gmail.com on 5 Jan 2012 at 10:52

GoogleCodeExporter commented 8 years ago
Missing tags:
Plsubbed

I see in logs a lot of tv series, but...this is not supposed to be here.
Isn't Imdb future avaiable only for films right now? Anyway a missing tag it 
would be season/episode tagging 

S??
S??EP???
EP??

Original comment by login...@gmail.com on 5 Jan 2012 at 1:52

GoogleCodeExporter commented 8 years ago
I'll see the full log this weekend
By garbage lines, I meant duplicates and these kind of lines :
183.97.156.227 - - [27/Dec/2011:04:50:38  0100] "GET /imdbWebService.php 
HTTP/1.1" 200 393 "http://chk.co-cc-domain.net/open_url_list.php?p=1" 
"Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"

The line in comment 25 should get us rid of all that

TV series are there because those do not respect the SXXEXX pattern and because 
they include one of the movies patterns. Unless we find a common pattern to all 
of these, I guess there's nothing we can do about it

Original comment by addicted...@gmail.com on 5 Jan 2012 at 10:13

GoogleCodeExporter commented 8 years ago
Latest log only contains one entry
I'll look into it this weekend

Original comment by addicted...@gmail.com on 5 Jan 2012 at 11:17

GoogleCodeExporter commented 8 years ago
garbage lines will ends shortly, when expires my account on co.cc.

Original comment by login...@gmail.com on 5 Jan 2012 at 11:52

GoogleCodeExporter commented 8 years ago
some more tags:
"PLSUB" "PLSUBBED" "brrrip" "TSXVID"   "XviD"  "DivXNL"   "divx"   "Subtit"     
"Subs "   "Subs."    "Subs-"    "Subs_"   "NL Subs"   "KLAXXON"  "aXXo"   
"BRRip"   "BDRip"    "Bluray"   "HDTV"   "HR HDTV"   "R5"  "Telesync"   
"TELECINE"   "Webrip"    "vomit"   "Dita"   "DVB"  "Omifast"    "@KIDZ"   
"KIDZCORNER"   "1080"  "720"  "480"   "x264"   "H264"   "AC3"   "AC-3"   "FXG"  
 ".TS"   "TS."    " TS"    "-TS"   "TS-"   "NTSC"   " WS"    "WS."    ".WS"   
"NL "    "NLT"  "CN "   "TC "    "ISO."   "Swesub"  "VHS"  "READNFO"   
"ViCiOsO"   "WorkPrint"   "ExtraTorrent"   "2Lions"   " VOSTFR"   "FxM"   
"DUQA"   "newartriot"   "nHaNc3"   "DDC"   "keltz"   "REAL PROPER"   "PROPER"   
 "DEWSTRR"   "CVCD"   "VCD"   "LIMITED"   "Electri4ka"   "Electrichka"   
"NORARS"   "aceford"   "jigaxx"   "ShortKut"   "danger2u"   "www."   "www "   
"1 of"  "1of"   "2 of"   "2of"   "3 of"   "3of"   "cd1"   "cd2"   "cd3"  "1CD"  
 "2CD"  "1 CD"    "PDVD-RIP"    "PDVD"    "PDV"    "Pre DVD"   "Pre-DVD"    
"DVD"    "PPVRIP"   "www"   "1CDRip"   "2CDRip"   "UNCUT "    "Director Cut"    
"Directors"   "Director's"     " TPB"   "PSP"   "PDTV"   "iPod"   "Zune"    
".avi"   "mp4"   "mpg"   "3gp"  "wmv"    "CAMELOT"    "CAM"  "mkv"   "m4"   
"xRipp"   "Goblin10"  "By .. DragonLord721"   "EXTENDED"   "Los Sustitutos"    
"BR-Scr"  "BR-Screener"   "SCREENER"   "SCR "     "SCR."   "UNRATED"   "REPACK" 
  "HQ"  "RETAIL"   "1337x"   "Noir"   "NEW SOURCE"   "DiTa"    "UVall"   "FQM"  
 "CHGRP"   "LMAO"   "NoTV"   "DVSKY"   "DSR"   "2HD"   "2Wire"   "Ekolb"   
"SHAMNBOYZ"  "!!!"  "~"  "ExtraScene"   "CHUPPI"   "MAXSPEED"  "ShareReactor"  
"ShareZONE"  "ShareGo"   "aAF"    "xRG"    "STV"   "-MAX"   "iNTERNAL"    
"RESYNC"   "SYNC-"   "SYNCFIX"   "TRUEFRENCH"    "FRENCH"   "ENGLISH"   
"SPANISH"   "iTA "   "iTALIA"  "Hindi"   "GERMAN"   " ENG"   ".ENG" "187HD"     
        "HR HDTV"   "FQM"   "LMAO"   "XOXO"   "eztv"   "PDV"   "PDTV"   
"TSXVID"   "XviD"   "DSR"   "DivXNL"   "Divx"   "2HD"   "2WIRE"   "NL Subs"   
"KLAXXON"   "aXXo"   "NoTV"   "BRRip"   "BDRip"   "Bluray"   "HDTV"   "R5"   
"BYU"   "DVB"   "Omifast"   "@KIDZ"   "KIDZCORNER"   "AC3"   "AC-3"   "FXG"   
"NTSC"   " WS"   "WS."   ".WS"   "NL "   "NLT"   "CN "   "TC "   "ISO."   
"Swesub"   "VHS"   "READNFO"   "ViCiOsO"   "WorkPrint"   "OneDDL.com"   
"fwint.com"   "  Demonoid com  "   "ExtraTorrent com"   "ExtraTorrent"   "VOST 
"   " VOSTFR"   "FxM"   "DDC"   "keltz"   "REAL PROPER"   "PROPER"   "CVCD"   
"VCD"   "LIMITED"   "www."   "www "   "PDVD"   "PDVD-RIP"   "PPVRIP"   "www"   
"1CDRip"   "2CDRip"   "Pre DVD"   "Pre-DVD"   "DVD"   "UNCUT "   " TPB"   "PSP" 
  "iPod"   "Zune"   "mp4"   "mpg"   "3gp"   "wmv"   "mkv"   "m4"   "xRipp"   
"YesTV"   "CRIMSON"   "EXTENDED"   "BR-Scr"   "BR-Screener"   "SCREENER"   "SCR 
"   "SCR."   "UNRATED"   "REPACK"   "HQ"   "RETAIL"   "Noir"   "NEW SOURCE"   
"DiTa"   "SHAMNBOYZ"   "!!!"   "ExtraScene"   "MAXSPEED"   "ShareReactor"   
"ShareZONE"   "ShareGo"   "aAF"   "xRG"   "STV"   "-MAX"   "RESYNC"   "SYNC-"   
"SYNCFIX"   "TRUEFRENCH"   "iTA "   "_BBC"   "_ITV"   "_Channel 4"   "_Film4"  
"cw4f"   "w4f"

Original comment by luk...@gmail.com on 12 Jan 2012 at 8:24

GoogleCodeExporter commented 8 years ago
Wow !
This is a huge list of patterns :-)
I'll add them this week-end

Thanks !

Original comment by addicted...@gmail.com on 12 Jan 2012 at 11:02

GoogleCodeExporter commented 8 years ago
I integrated a bunch of them in the SVN I'll commit really soon.
I also allowed user defined patterns to be added to the settings.ini file.

Here are the patterns that are not yet integrated
DEWSTRR
CVCD
VCD
Electri4ka
Electrichka
NORARS
aceford
jigaxx
ShortKut
danger2u
PDVD-RIP
PDVD
PDV
Pre DVD
Pre-DVD
DVD
Director Cut
Directors
Director's
TPB
PSP
iPod
Zune
CAMELOT
xRipp
Goblin10
DragonLord721
EXTENDED
Los Sustitutos
REPACK
HQ
1337x
NEW SOURCE
UVall
CHGRP
LMAO
NoTV
DVSKY
DSR
2HD
2Wire
Ekolb
SHAMNBOYZ
ExtraScene
CHUPPI
MAXSPEED
ShareReactor
ShareZONE
ShareGo
xRG
STV
SYNCFIX
TRUEFRENCH
eztv
2WIRE
AC3
AC-3
FXG
NTSC
VHS
ViCiOsO
OneDDL.com
fwint.com
Demonoid.com
VOST
FxM
DDC
ShareReactor
ShareZONE
ShareGo
_ITV
_Channel 4
_Film4

Original comment by addicted...@gmail.com on 14 Jan 2012 at 12:13

GoogleCodeExporter commented 8 years ago
after server change, i'm using lighttpd now, we have to change mail command

Original comment by login...@gmail.com on 2 Feb 2012 at 2:10

GoogleCodeExporter commented 8 years ago
We have to add this TAG, i think:
3D, SBS

Original comment by login...@gmail.com on 29 Mar 2012 at 4:47

GoogleCodeExporter commented 8 years ago
dvdrip, Dvd, Eng, dvd5, dvd9, torrents, Www, X264, dvdripspanish, dvdscr, 
Torrent, fansub

Original comment by login...@gmail.com on 1 Apr 2012 at 4:51

GoogleCodeExporter commented 8 years ago
half-sbs, full-sbs

Original comment by login...@gmail.com on 1 Apr 2012 at 4:53

GoogleCodeExporter commented 8 years ago
Sorry for my lack of updates during the last few (many) months.
New job that takes most of my time, new apartment that requires a lot of work, 
not enough time remaining to take care of torrentexpander.
Sorry for that

Those tags have been added

Thanks for your input

   Addictedtoscreens

Original comment by addicted...@gmail.com on 2 Dec 2012 at 8:46