Open muttsnutts opened 11 years ago
I think I got this one with: [POPMp4FileTagSearch.m]
68: //set the search string by the filename.
69: search_str = [[[tag filename] lastPathComponent] stringByDeletingPathExtension];
70: search_str = [search_str stringByReplacingOccurrencesOfString:@"." withString:@" "];
71: search_str = [search_str stringByReplacingOccurrencesOfString:@"_" withString:@" "];
any other chars you can think of that just need to be removed or replaced?
I'm gonna look into this more over the next week. Mainly to see how xbmc, plex, subler handle the various naming conventions.
I'll create a list of file names from various trackers and see how they match with each regexp.
To me, this is a VERY important step, if we can get it to match 90% of filenames correctly, this will make the end user happy.
Yes, yes, very important, actually I would say this is like 60% of what the application does, maybe even more. It is easy to see my current algorithm for this:
change "." and "_" to " " Look for /\([0-9]{4}\)/ *found: its a movie, everything before the regex is the movie name. *not found: Look for /E[0-9]+ / *found: Its a TV show! Look for /\-{0,1} *S[0-9]+E[0-9]+ *\-{0,1}/ *found: use the numbers that follow S as the season, use the numbers that follow E as the episode. see if there is anything before the S, if so that is the show name. Check the directory that the file is in agianst "/season *[0-9]+/" *found: if we did not get a season from the file name use the [0-9]+ as the season. if we did not get a show name, use the directory above this one as the show name *not found: if we did not get a show name, use this directory as the show name Nothing matches, use the filename and search the Movie database with it. Search the appropriate database.
obviously a really simple not complete algorithm, but hopefully this outline will give us something to augment to get a more inclusive search.
First problem I see is: LOOK FOR THE TV SHOW STUFF FIRST. Why? Because some files will use ([0-9]{4}) in the show name to distinguish between newer versions of the show and older ones. Augmentation of the algorithm:
change "." and "_" to " " Look for /E[0-9]+ / *found: Its a TV show! Look for /\-{0,1} *S[0-9]+E[0-9]+ *\-{0,1}/ *found: use the numbers that follow S as the season, use the numbers that follow E as the episode. see if there is anything before the S, if so that is the show name. Check the directory that the file is in agianst "/season *[0-9]+/" *found: if we did not get a season from the file name use the [0-9]+ as the season. if we did not get a show name, use the directory above this one as the show name *not found: if we did not get a show name, use this directory as the show name *not found: Look for /\([0-9]{4}\)/ *found: its a movie, everything before the regex is the movie name. [0-9]{4} is the year of the movie. *not found: Nothing matches, use the filename and search the Movie database with it. Search the appropriate database.
Kevin.
Sorry I haven't gotten to this yet. It's breast cancer month, which means lots of after work non-profit events for me (not that I'm complaining).
I think that your workflow is solid. A few thoughts/questions:
.
adversely affect searching for something like "Tosh.0":
, apostrophe '
, dash -
, ampersand &
, exclamation mark !
?/\([0-9]{4}\)/
, I think we need to next confirm that the number is greater than > 1900. It turns out some groups name their shows like 0113 instead of S01E13 sigh Also per yahoo the first motion picture ever created was in 1878 =PAlso "LOOK FOR THE TV SHOW STUFF FIRST. Why? Because some files will use ([0-9]{4}) in the show name to distinguish between newer versions of the show and older ones.".... You were referring to "Teenage Mutant Ninja Turtles 2012" right ?!? hahaha
Just to prove that there are lots of shows with special characters, here are just a few:
colon :
NCIS: Los Angeles, Star Wars: The Clone Wars, Anthony Bourdain: No Reservations, CSI: NY
apostrophe '
Fast N' Loud, Bob's Burgers, Kickin' It, How It's Made, Grey's Anatomy
dash -
Hawaii Five-0, Ultimate Spider-man
ampersand &
Mike & Molly, Brothers & Sisters, Law & Order
exclamation mark !
American Dad!, Superjail!
Well,
I put all the search logic into a cgi script on popmedic.com so that all searches are proxy though there. This should be faster and makes it so the file detection logic can be done in ruby. This option is on by default.
the new logic is below:
serstr = ''
seastr = '0'
epistr = '0'
is_movie = true
#check to see if this basestr is a show
#first check for / e([0-9]+)/i
if((md = /e([0-9]+)/i.match(basestr)) != nil)
is_movie = false
epistr = md[1]
#see if we have a series name...
if((md = /(.+) e[0-9]+/i.match(basestr)) != nil)
serstr = md[1].strip
end
#see if there is a /s([0-9]+)/i for a season...
if((md = /s([0-9]+)/i.match(basestr)) != nil)
seastr = md[1]
#see if we have a series name...
if((md = /(.+) s[0-9]+ *e[0-9]+/i.match(basestr)) != nil)
serstr = md[1].strip
end
end
#maybe we have a / ([0-9]+)x([0-9]+)/i, could be a SxE...
elsif((md = /([0-9]+)x([0-9]+)/i.match(basestr)) != nil)
is_movie = false
epistr = md[2]
seastr = md[1]
#see if we have a series name...
if((md = /(.+) [0-9]+x[0-9]+/i.match(basestr)) != nil)
serstr = md[1].strip
end
#maybe we have a /([0-9]{4})/ could be a date, or a SSEE...
#elsif((md = /([0-9]{4})/i.match(filename_str)) != nil)
end
#if we don't have a movie and we don't have a seastr and we have a parent dir string, check the parent...
if(is_movie == false && parentdir_str != nil)
#see if the parentdir_str has /season ([0-9]+)/
if((md = /season ([0-9]+)/i.match(parentdir_str)) != nil)
if(grandparentdir_str != nil)
parentdir_str = grandparentdir_str
end
if(seastr == '0')
seastr = md[1]
end
end
end
#if we don't have a movie and we don't have a serstr and we have a parent dir string, make the serstr tha parent...
if(is_movie == false && serstr == '' && parentdir_str != nil)
serstr = parentdir_str
end
rtn = []
movstr = ''
yearstr= ''
#if we have a movie, do a movie search
if(is_movie)
if((md = /(.+) {0,1}\({0,1}([0-9]{4})\){0,1}/i.match(basestr)) != nil)
movstr = md[1].chomp("(")
movstr.chomp!(" ")
yearstr = md[2].chomp
else
movstr = basestr
end
rtn = Search.movie_search(basestr, movstr, yearstr)
#otherwise do a show search
else
rtn = Search.show_search(basestr, serstr, seastr, epistr)
end
#if we still have nothing, and we did not do a movie search...
if(rtn.count == 0 && !is_movie)
rtn = Search.movie_search(basestr)
end
#now if it is a use_itunes request,
if(use_itunes == 1)
#get the images from itunes
rtn.each do |tag|
if(tag["Media Type"]["value"] == 'tvshow')
img_path = SearchITunes.get_image({"serstr" => tag["TV Show"]['value'], "seastr" => tag['TV Season']['value']}, false)
else
img_path = SearchITunes.get_image({"movstr" => tag["TV Show"]['value'], "yearstr" => tag['Release Date']['value'].to_i().to_s()}, true)
end
#self.dbug(img_path)
if(img_path!=nil)
if(img_path != "")
tag["Image Path"] = img_path
end
end
end
end
So, I haven't heard from Mutts Nutts for like 3 months, so I am on my own with this project again (and now it is under his github account.) Anyway, I had an idea and ran with it. I figured moving the filename search logic and web queries to a proxy cgi/server on a webhost would speed the searches up and lighten the load on ones personal computer. Also, and most importantly, this makes it so as I improve the logic, users will not have to update the client. In addition, this chunk of code is written in ruby (my host does not allow me to run webrick on the server, so I made it a cgi. There is also a mp4autotag_server.rb that will run the server as a standalone.) Rube is a superior language to objective c when it comes to string parsing and simplicity, so this makes modifying and perfecting the search logic easier and faster. I left all the original search logic in the client application and added a preference to use the popmedic search proxy. This preference is on by default because I would rather have the users use this proxy then the application logic that can only be updated by a client update. It also leaves me a great way for adding ad support if following get good enough and I think could make a little $$$ off this work.
This process is finished and after adding some slick server side caching I have successfully speed up the searches and implemented this design. Please take a look at the code, especially the server side ruby scripts, I think they are slick and the logic works well so far.
Todo: (updated 2012.10.07)
Improve accuracy of searching based upon file name.
Currently failing on conventional scene-naming-schemes:
Notes: