z411 / trackma

Open multi-site list manager for Unix-like systems. (ex-wMAL)
https://z411.github.io/trackma
GNU General Public License v3.0
779 stars 81 forks source link

Failing to recognize episode pattern from filename #403

Open Soitora opened 5 years ago

Soitora commented 5 years ago

Hey! again :)

I noticed that some shows of mine were straight up ignored, and when I passed the tracker command I think I found out why.

I'm basically using Sonarr naming for all my shows as it contains a great deal of information, like this: image

The problem however, is that Trackma sees the absolute episode part as the actual episode number, like so: image

Absolute numbering helps a ton with clients like Plex and Emby managing your content and finding the correct metadata for it, turning it off just for Trackma would make my content management much more difficult in certain situations.

Thank you.

Soitora commented 5 years ago

Another case where it straight up wasn't able to find the anime in my list despite it being very easy entry such as Dororo

image

purposelycryptic commented 5 years ago

I used Emby for many, many years (back to when it was still Mediabrowser, then Mediabrowser 2,etc), and currently use Plex.

Three big things from both of their recommended naming schemes:

I have no clue how to elegantly include both a SxxExx AND an Absolute episode number in a way that would generally be compatible - you have the formatting for each correct for their naming conventions (Either xXxx or SxxExx for TVDB-compatible season-based numbering, and xxx for Absolute), but having both without a clear divider that is semi-universally recognized seems like a recipe for chaos. Not sure if either one has a developed a standard for including both since then.

I used to use SxxExx when I was using Emby, but moved to AniDB's non-season system based on the Japanese one (Single-zero padded episode number) once I switched to Plex and was able to use HAMA for all my anime metadata.

It is also worth noting that none of the anime-tracking/collecting sites follow either pattern - they use the Japanese-style, which is a non-season pattern where each "season" is considered a new series, with the episode-count starting from one. So, none of those sites would match either numbering system you have for the second season of 'One-Punch Man' - you'd need something with a built-in conversion system like HAMA has to have an anime site keep track of it.

Unless you have a reason for doing so, I'd also drop all the periods and just use regular spaces - I know it used to cause issues way back, but AFAIK, pretty much all systems can handle them now; failing that, I'd go with Underscores, which are more universally accepted as space-alternatives today.

Also (and this is a really minor point, I'm just a bit metadata-OCD), none of HorribleSubs releases are HDTV caps - they are all pulled from streaming sources, usually CrunchyRoll.

No clue if this helps, but this is the (basic) pattern I use now, which works great with Trackma, as long as the AniDB and AniList series names match (as I'm deep in AniDB):

Dororo (2019) - 16 [HorribleSubs] [www, 720p, AAC] [A7371C9F].mkv

And this is what I used to use with Emby:

One Punch Man - s02e03 [HorribleSubs] [www, 720p, AAC] [01BF6883].mkv

The latter is unlikely to work with Trackma, as none of the sites it works with use seasons or absolute episode numbers that continue counting from the prequel series.

No idea if this was remotely helpful information or not, but either way, I hope you find something that works for you.

Soitora commented 5 years ago

Hey, thanks for the informational and lengthy post!

I'll start off with saying that I removed the absolute numbering system already and plan to modify my current one aswell!

What my naming scheme is essentially is Sonarr's recommended way of naming anime combined with the fact that I initially planned to have my media on a server, which brough about the periods and hyphens!

And it's Sonarr doing the HD thing for HorribleSubs, should they be WEBDL or WEBRIP then?

Do you think this would work then?

Episode Format: {Series Title} - s{season:00}e{episode:00} [{absolute:000}] [{Release Group}] [{MediaInfo VideoCodec}, {Quality Title}, {MediaInfo AudioCodec}] Output Single: The Series Title! - s01e01 [001] [RlsGrp] [x264, HDTV-720p, DTS] Output Multi: The Series Title! - s01e01-03 [001-002-003] [RlsGrp] [x264, HDTV-720p, DTS]

Real Example: One Punch Man - s02e04 [016] [HorribleSubs] [x264, WEBRIP-1080p, AAC]

Do you think the inclusion of absolute numbering here in brackets would keep it safe from Trackma trigger?

Trackma (haven't tested) should at least be able to handle s00e00 title formats, I understand not going the furthest way and supporting absolute numbering however.

Sincerely, Soitora

Soitora commented 5 years ago

Any idea where it assumes the Episode 1 now from that format?

image

image

purposelycryptic commented 5 years ago

@Soitora So, the reason your files are being recognized as episode 1 is that, actually, Trackma isn't finding an episode number at all, which causes it to default to episode 1.

The problem is that Trackma can't interpret the episode number in the sXXeYY format at all, and, since we put the absolute episode number in square brackets, it isn't looking there for it.

Here is what Trackma is actually getting from your filenames in the current format:

For OPM2:

Variable Value
self.originalFilename One Punch Man - s02e08 [020] [HorribleSubs] [x264, HDTV-1080p, AAC].mkv
self.resolution 1080p
self.hash  
self.subberTag 020
self.videoType x264
self.audioType AAC
self.releaseSource  
self.extension mkv
self.episodeStart  
self.episodeEnd  
self.volumeStart  
self.volumeEnd  
self.version 1
self.name One Punch Man - s02e08
self.pv -1

It is basically only getting the series name, and even that is somewhat cobbled together; One Punch Man 2 was probably the closest match it could get from One Punch Man - s02e08. Technically, it isn't even a correct match, since you have One Punch Man, and not One Punch Man 2 as the title in your filename - it just happened to integrate the sXXeYY episode number into the series name in a fortuitous way.

For Dororo:

Variable Value
self.originalFilename Dororo (2019) - s01e19 [019] [HorribleSubs] [x264, WEBRIP-1080p, AAC].mkv
self.resolution 1080p
self.hash  
self.subberTag 019
self.videoType x264
self.audioType AAC
self.releaseSource  
self.extension mkv
self.episodeStart  
self.episodeEnd  
self.volumeStart  
self.volumeEnd  
self.version 1
self.name Dororo
self.pv -1

The reason we are getting a clean series name here is likely because the actual series name is Dororo (2019), and so the sXXeYY part is getting wiped out along with the year for series ID purposes. You might also have noticed that in both cases, the Absolute Episode Number is being picked up as the Sub Group - it seems to pick the first bit of text in square brackets after the Series Name/Episode Number as the most likely candidate for Group Name.

Anyway, I did some testing with a few variations of your naming-scheme, to see if I could fix it, and I did... sort of.

I can't get it to use the YY part of the sXXeYY episode ID without modifying AnimeInfoExtractor.py to support the format - so I did (See this commit here). That also fixed issue #413, as far as I can tell, as well as my personal issue with years being stripped from series names for ID purposes.

Only one problem: since Trackma was kind of identifying your OPM2 file names as the correct series by coincidence rather than because it was working correctly, with the updated AnimeInfoExtractor.py, it now IDs it as the first season. There's not much I can do about that, though, since it is just taking the series name given to it, and none of the Anime sites use the concept of seasons, so without the AniList/Kitsu/MAL equivalent of ScudLee's Anime_Lists for AniDB, there is no clear mapping between TVDB seasons and the actual anime series... Sorry.

After testing, you should still probably modify your naming scheme so that the Absolute Episode Number is in round rather than square brackets - it doesn't cause any visible issues the way it is from what I can tell, but it causes Trackma to use that number as the Sub Group. Currently Trackma does nothing with that data AFAIK, but it doesn't hurt to make the change, either way.

Here is what Trackma's extracted data looks like for those two files using the modified AnimeInfoExtractor.py and round brackets for ABS_EP in the naming scheme:

For OPM2:

Variable Value
self.originalFilename One Punch Man - s02e08 (020) [HorribleSubs] [x264, HDTV-1080p, AAC].mkv
self.resolution 1080p
self.hash  
self.subberTag HorribleSubs
self.videoType x264
self.audioType AAC
self.releaseSource  
self.extension mkv
self.episodeStart 8
self.episodeEnd  
self.volumeStart  
self.volumeEnd  
self.version 1
self.name One Punch Man
self.pv -1

For Dororo:

Variable Value
self.originalFilename Dororo (2019) - s01e19 (019) [HorribleSubs] [x264, WEBRIP-1080p, AAC].mkv
self.resolution 1080p
self.hash  
self.subberTag HorribleSubs
self.videoType x264
self.audioType AAC
self.releaseSource  
self.extension mkv
self.episodeStart 19
self.episodeEnd  
self.volumeStart  
self.volumeEnd  
self.version 1
self.name Dororo (2019)
self.pv -1

Hope that helped, and sorry I couldn't do more for the second-season issue - the only thing I can think of for that is to use the free version of filebot together with Sonarr, as that has AniDB- naming integrated. That's what I use (well, without Sonarr, I just use Shana Project)

Soitora commented 5 years ago

Thank you for the lengthy explaination!

I am currently trying out Filebot (together with Shana Project) but one thing I noticed when attemping some downloads:

[HorribleSubs] One Punch Man S2 - 07 [720p].mkv gets filed as One Punch Man - 07 - [HorribleSubs] [x264, 720p, AAC] [6D4B8217].mkv

(What's the www in your name?)

^ Aka, this is seen by Filebot as season 1 and not season 2.

Also since you use AniDB, how do you deal with multiple seasons? Do HamaTV+Absolute Series Scanner take care of seasons? Because it didn't last time I used it.

This is my Filebot format for the AMC script {n}/{primaryTitle} - {e00} - [{group}] [{vc}, {vf}, {ac}] [{crc32}]

purposelycryptic commented 5 years ago

RE: Misidentification: With Filebot, you can correct its auto-detected series by double-clicking on it, and putting in the right one - you only have to do it once, and it should get it right every time thereafter.

(What's the www in your name?)

That's the source field - AniDB uses wwwas its source string for webrips, and as I've been using AniAdd to rename my files and add them to my AniDB MyList for the better part of a decade, it stuck. I still usually use AniAdd periodically on my new files to correct any errors FileBot might make, since all their metadata is hand-verified for the DB (I have FileBot put files into a temporary folder, and then, once a week or so, run AniAdd on it, which drops them into their permanent home). That is also in part why I use AniDB series naming, even though it gives Trackma some trouble (Which I'm trying to solve, among several other issues, by adding to and rewriting Trackma's AnimeInfoExtractor.py - we'll see how it turns out).

Also since you use AniDB, how do you deal with multiple seasons? Do HamaTV+Absolute Series Scanner take care of seasons? Because it didn't last time I used it.

Both ASS and HAMA have gotten considerably more advanced over the past year, and now supports all sorts of ways to show your anime in Plex. For example, you can have your files named and organized according to AniDB, and have HAMA organize them according to TheTVDB inside Plex - this works in large thanks to the mapping data from ScudLee's Anime_Lists, which essentially maps each episode in AniDB to its counterpart in TheTVDB.

So you can still have the season-style organization inside Plex if you prefer that, even without that file-organization scheme. I personally prefer the Japanese-style, so I have each series in a franchise as its own individual show, and have Plex hide seasons (if there aren't any specials). I do usually put related series in collections though. HAMA can automatically do this to an extent, but the Anime_List for collections isn't as extensively maintained, so a lot of it is still manual - they've been working on a way to automate it more fully, using AniDB series relation data, but I don't think they're quite there yet.

Thanks to the Anime_Lists, Trakt also still works perfectly even if you use the AniDB organization scheme, which is pretty great.

This is my Filebot format for the AMC script {n}/{primaryTitle} - {e00} - [{group}] [{vc}, {vf}, {ac}] [{crc32}]

That looks pretty good to me, although you probably don't need that hyphen after the episode number anymore using this structure. I don't actually use AMC (which plenty of people think is nuts, but...), and instead use Deluge with Laharah's FileBotTool plugin. Whether that is a better solution or not is debatable, but it lets me use different FileBot renaming schemes based on how my torrents are labeled, which, in turn, is handled by FlexGet and the LabelPlus Deluge plugin.

It definitely makes it simpler (for me, anyway) to set the series title string for FileBot to exactly what I want it to be, based on rules you define in FileBotTool. So rather than correcting FileBot when it misidentifies a file, I just set a rule there instead (Which is actually more work, I suppose, but allows for more flexibility).

My base rename pattern is as follows: {primaryTitle.replace(':',' -')} [{group}] [{fn =~ /BluRay|Blu-Ray|Bluray|BD/ ? \"BD\" : \"www\"}, {vf}]/{primaryTitle.replace(':',' -')} - {e00} [{group}] [{fn =~ /BluRay|Blu-Ray|Bluray|BD/ ? \"BD\" : \"www\"}, {vf}, {ac}] [{crc32}]

Not all that different from yours, except for defining the folder name as well (Which I guess AMC does for you?).

You can safely ignore the {fn =~ /BluRay|Blu-Ray|Bluray|BD/ ? \"BD\" : \"www\"} part, as it is just a less-accurate replacement for the source string, which checks if the file is a Blu-Ray release, and otherwise assumes its a WebRip (Since I run everything through AniAdd eventually anyway, if it happens to be a TVRip, it gets corrected then).

I would include the {primaryTitle.replace(':',' -')} part instead of just the plain primaryTitle though; all it does is replace the illegal characters ":" and "," with " -", since filenames can't have them. You may want to have it remove these as well: \":/*|<>?, come to think of it - between making custom rules and running things through AniAdd, my basic pattern has become a little lazy.

While they don't look like this in Deluge, thanks to the nice, easy-to-use GUI, the actual custom rules look something like this in the config file:

    [
      15, 
      "file path", 
      "contains", 
      "Bungou Stray Dogs", 
      "Bungou Stray Dogs (2019)"
    ], 

which tells it to use the "Bungou Stray Dogs (2019)" pattern, which is:

    "Bungou Stray Dogs (2019)": {
      "episode_order": "absolute", 
      "show_advanced": true, 
      "subs_language": "", 
      "format_string": "Bungou Stray Dogs (2019) [{group}] [{fn =~ /BluRay|Blu-Ray|Bluray|BD/ ? \"BD\" : \"www\"}, {vf}]/Bungou Stray Dogs (2019) - {e00} [{group}] [{fn =~ /BluRay|Blu-Ray|Bluray|BD/ ? \"BD\" : \"www\"}, {vf}, {ac}] [{crc32}]", 
      "database": "AniDB", 
      "encoding": "UTF-8", 
      "rename_action": "move", 
      "download_subs": false, 
      "query_override": "", 
      "language_code": "", 
      "output": "R:\\Media\\Anime - TBF", 
      "on_conflict": "skip"
    }, 

While I used the main AniDB title here, since that's what it will get from AniAdd later anyway, that only really applies to me (I've slowly written and rewritten my AniAdd rename script over many, many years, and it does a lot of little things most people wouldn't care about). So you could also use a more Trackma-friendly name, such as "Bungou Stray Dogs 3" - HAMA shouldn't have a problem with it, and, in the rare case it does, you can force-set it to match whatever series you want.

This also easily allows me to fix the GroupName for when it isn't in FileBot's list of groups (One of the forum mods who decides what groups are "list-worthy" or not is kind of a prick, and has blocked quite a few Sub-Groups I tried to get added.

I honestly forget what exactly FileBot and AMC can do, since I only ever interact with them via the Deluge plugin, but one of the nice features of the plugin (that I don't know if FileBot has natively) is that, even after it moves and renames files, you can still keep seeding them, as it redirects Deluge properly. Alternatively, it can even make a symlink, and not move the actual file at all (which to me, at least, is pretty damn cool, but I'm a dork).

Anyway, that was probably far more than you really wanted to know, but hopefully some of it proves helpful :-)