Create a wrapper around Anitopy as an alternative to AnimeInfoExtractor

FichteFoll commented 1 year ago

I have been running a local branch configured with the anitopy parser of #591 for a while now and have also made some ad-hoc comparisons for the rescan results of my library. I have added a configuration to switch between the two parsers for everyone to try (but didn't add that to the GUIs so it needs to be edited in the config file directly). AIE stands for AnimeInfoExtractor.

Comparisons

For context, all scans in this comparison where made against my local library which counts 1370 files (not all of which are anime or for shows on my list, but most are; so around ~1250). Anitopy version used was 2.1.0.r2.fef4fc3. There have been a few bugfixes since then but nothing relevant for my library afaict.

	AIE	Anitopy
Runtime (`python -m trackma.ui.cli -d rescan`)	265s (1 run)	270s (3 runs)

Why is this a table? Because I thought I'd have more things to compare easily, but the case-by-case comparisons make more sense in text. That said, both were pretty much equal in what they matched properly (hint: almost everything, with proper altnames set up which wouldn't be the fault of the parser anyway, usually, except for some edge cases).

Talking about indvidual examples, here are some of my observations with anitopy (compared to AIE) and the library scans of above:

Both strip underlines correctly if they are used consistently throughout the file name.
Both recognize various episode formats (S01E01, Episode 1, E01 - 01).
Anitopy strips special characters at various positions, e.g. the dots in Akuyaku Reijou ... or the plus in Cardfight!! Vanguard - will+Dress. The former needs a different altname for AIE and Anitopy because of this. Appears to be configurable for Anitopy.
Anitopy cannot handle 01v1.5 episodes numbers which AIE does. Whole numbers like 01v1 work, though. Shouldn't be hard to fix. Example: Mahoutsukai Reimeiki E06v1.5 [1080p][AAC][JapDub][GerEngSub][Web-DL].mkv
Anitopy parses Kodocha - 001 I`m an Elementary School Student with an Agent.mkv to Kodocha which AIE does not.
AIE more aggressively strips parentheses from names, which is worse for shows with a year e.g. Urusei Yatsura (2022), but better for files with both a Japanese and English name, e.g. [SlyFox] Summertime Rendering (Summer Time Render) - 12 [A746BE85].mkv and Yuuki Yuuna wa Yuusha-bu Shozoku (Yuki Yuna is a Hero - Club Member) - 01 [EngSub][BD 720p Hi444PP AAC].mkv. Hard to determine in general whether parentheses should be stripped.
AIE failed to parse [The Impatient Miyafuji Kantai] Luminous Witches - 06.5 [1080p][F48884A8].mkv, converting it to The Impatient Miyafuji Kantai. Anitopy works. Should be an easy fix, though.

Not related to the parsers:

Yuuki Yuuna wa Yuusha de Aru - Hidamari - OVA is mapped to Yuki Yuna is a Hero: The Great Mankai Chapter - 1 (both produce this).

Packaging-related:

AIE is a custom implementation while anitopy is a centralized and shared implementation based on Taiga's anitomy, where fixes to either repo should also be translated to the other. It even allows for updating the parsing library independent of trackma.
AIE currently has tests inside the repo. Anitopy has its own tests and I would definitely be inclined to translate any test cases it's currently missing (at some point).
Anitopy is a thirdy-party dependency. That means either depending on it optionally and providing AIE as a fallback or requiring it.

Conclusions

In summary, I believe both work satisfactorily and can be used interchangeably. I don't want to pick a favorite here, so I'm proposing to add the option for Anitopy to be used if desired. We can always remove either later if we deem it useless. In principle, I'm more inclined to Anitopy though because of it focusing on parsing file names only.

Furthermore, I also made some logging enhancements to make the following comparisons easier for me and found them to also be generally useful, e.g. when trying to diagnose mismatches in normal use.

I'm not too fond of the code architecture and quality of AIE, ~~but it's still much better than the Anitopy wrapper here. Especially the hasattr checks are atrocious. I don't feel like spending more time on this right (now) and~~ (fixed now) I have confirmed that it at least works and it also hasn't crashed on me once or similar. If a decision has been made to remove AIE, it'd be much easier to rewrite the wrapper to not have to imitate AIE's API.

Either way, I'd rather get this off my local branch than have it sit around for another 0.5 years.

Includes and closes #591. Supersedes and closes #473.

z411 commented 1 year ago

The idea of having Anitopy as an option has been in my head for a while. Thanks, I'll test it myself. One thing; have you made performance comparisons? I'm a bit reluctant about adding additional parsing logic to the Anitopy parser but if it doesn't affect performance very much then it's ok. Scan list problems come more from our difflib usage; that should definitely need our attention.

FichteFoll commented 1 year ago

Yes, I made performance comparisons (via python -m trackma.ui.cli -d rescan) and they are about the same speed (265-270s). I have included them in the original post but also edited a bit to clarify (just realized I forgot to add a unit). I expect most of the time to be spent matching against the list and not parsing the anime title, though.

I also figured that sleep isn't important for humans anyway and went ahead with refactoring to remote the __dict__ abuse. It just didn't sit well. :sweat:

FichteFoll commented 1 year ago

Since this is probably the biggest outstanding PR currently, I'd like to merge it soon-ish and continue with #624 when I find some free time.

z411 / trackma

Create a wrapper around Anitopy as an alternative to AnimeInfoExtractor #663

Comparisons

Conclusions