z411 / trackma

Open multi-site list manager for Unix-like systems. (ex-wMAL)
https://z411.github.io/trackma
GNU General Public License v3.0
761 stars 82 forks source link

Create a wrapper around Anitopy as an alternative to AnimeInfoExtractor #663

Closed FichteFoll closed 10 months ago

FichteFoll commented 1 year ago

I have been running a local branch configured with the anitopy parser of #591 for a while now and have also made some ad-hoc comparisons for the rescan results of my library. I have added a configuration to switch between the two parsers for everyone to try (but didn't add that to the GUIs so it needs to be edited in the config file directly). AIE stands for AnimeInfoExtractor.

Comparisons

For context, all scans in this comparison where made against my local library which counts 1370 files (not all of which are anime or for shows on my list, but most are; so around ~1250). Anitopy version used was 2.1.0.r2.fef4fc3. There have been a few bugfixes since then but nothing relevant for my library afaict.

AIE Anitopy
Runtime (python -m trackma.ui.cli -d rescan) 265s (1 run) 270s (3 runs)

Why is this a table? Because I thought I'd have more things to compare easily, but the case-by-case comparisons make more sense in text. That said, both were pretty much equal in what they matched properly (hint: almost everything, with proper altnames set up which wouldn't be the fault of the parser anyway, usually, except for some edge cases).

Talking about indvidual examples, here are some of my observations with anitopy (compared to AIE) and the library scans of above:

Not related to the parsers:

Packaging-related:

Conclusions

In summary, I believe both work satisfactorily and can be used interchangeably. I don't want to pick a favorite here, so I'm proposing to add the option for Anitopy to be used if desired. We can always remove either later if we deem it useless. In principle, I'm more inclined to Anitopy though because of it focusing on parsing file names only.

Furthermore, I also made some logging enhancements to make the following comparisons easier for me and found them to also be generally useful, e.g. when trying to diagnose mismatches in normal use.

I'm not too fond of the code architecture and quality of AIE, but it's still much better than the Anitopy wrapper here. Especially the hasattr checks are atrocious. I don't feel like spending more time on this right (now) and (fixed now) I have confirmed that it at least works and it also hasn't crashed on me once or similar. If a decision has been made to remove AIE, it'd be much easier to rewrite the wrapper to not have to imitate AIE's API.

Either way, I'd rather get this off my local branch than have it sit around for another 0.5 years.

Includes and closes #591. Supersedes and closes #473.

z411 commented 1 year ago

The idea of having Anitopy as an option has been in my head for a while. Thanks, I'll test it myself. One thing; have you made performance comparisons? I'm a bit reluctant about adding additional parsing logic to the Anitopy parser but if it doesn't affect performance very much then it's ok. Scan list problems come more from our difflib usage; that should definitely need our attention.

FichteFoll commented 1 year ago

Yes, I made performance comparisons (via python -m trackma.ui.cli -d rescan) and they are about the same speed (265-270s). I have included them in the original post but also edited a bit to clarify (just realized I forgot to add a unit). I expect most of the time to be spent matching against the list and not parsing the anime title, though.

I also figured that sleep isn't important for humans anyway and went ahead with refactoring to remote the __dict__ abuse. It just didn't sit well. :sweat:

FichteFoll commented 1 year ago

Since this is probably the biggest outstanding PR currently, I'd like to merge it soon-ish and continue with #624 when I find some free time.