stashapp / stash

An organizer for your porn, written in Go. Documentation: https://docs.stashapp.cc
https://stashapp.cc/
GNU Affero General Public License v3.0
9.31k stars 798 forks source link

[EPIC] Better title creation from filename #152

Closed timcantona closed 4 years ago

timcantona commented 5 years ago

It seems simple to do, but I was thinking the title should be a little smarter than just using the full filename.

Often there will be . or _ - etc instead of spaces, as well as extra data at the end 1080p, FULL, etc.

It would be nice if the title was a cleaned up version of the filename, for example:

Filename: claire-castel-at-the-service-of-theses-gentlemen-for-a-hard-dp-6699-1080p_full_mp4.mp4 Title: Claire Castel at the service of theses gentlemen for a hard dp 6699

and maybe even when a date is contained, that gets applied to the date field?

Filename: dorcelclub.18.12.26.claire.castel.hot.night.in.club.xtrem.mp4 Title: Dorcelclub claire castel hot night in club xtrem and date set to the 18.12.26 from the filename

Perhaps when #6 is implemented, you could be take more factors into account, for example in the last one you would recognise Dorcelclub as the studio, so the title would just become "Claire castel hot night in club xtrem"

Leopere commented 5 years ago

This issue is notoriously a complex issue that is in most streaming self-hosted apps, and likely won't be fixed any time soon but is hugely useful. One potential option is developing a scraper that compiles .NFO standard files and collects and stores any public domain stock imagery for scenes/preformers/etc...

Leopere commented 5 years ago

There has been discussion in the Discord surrounding how to handle a lot of this but I think that eventually aiming for something in relation to how MusicBrainz and Picard handle tagging and file renaming and all of that fancy stuff will help make big leaps in the right direction.

I think a hugely useful potential direction will be to do things with Perceptual Fingerprinting via FFMPEG but that requires a whole new issue for discussion.

GernBlanston12 commented 4 years ago

If you implemented this alongside/under the auto-tagger for Performers and Studios, whatever is left of the file name after you edit/strike the Performers and Studios and separator charactors, would likely be the Scene name, right? I mean, if you're parsing the file name for these things anyway, it can't be too hard to subtract those characters from the name as you go, right? Example: File name "Gene Hackman - Orion Pictures - Mississippi Burning 03.11.98" shows up as "Mississippi Burning", once you strike the Performer, the Studio, special characters, date, and lastly, leading and trailing spaces.

Sometimes I wonder why the default answer for half the new feature suggestions in Stash is some heretofore undreampt-of perceptual fingerprinting, exotic AI facial recognition, a community database of all possible porn, or some other massive technological undertaking. Parsing text out of a file name via a list of known tags in the db can't be that hard, can it?

Leopere commented 4 years ago

Parsing text out of a file name via a list of known tags in the DB can't be that hard, can it? It is a pretty tricky task to get right; there are a few companies that were established to help understand things like this; for example, Google was one.

That said, we don't need to do such a complicated job as Google with something that is typically within the scope of just doing file identification and metadata curation.

Adding things like facial recognition and p-hashing are just additional options for zeroing in on the data. Often people who have a "stash" in the first place want something that can manage this similar to why MusicBrainz had been created powered by hobbyist collectors and curators. There is a list of general priorities such as starting with just simple metadata collection and hashing, which is trivial and not complicated at all or loosely defined as a massive technological undertaking by any stretch of the imagination. The API is just a side project that seems to have plenty of interest in the various things that may appear as undreamt-of but its all based in relatively sober reality.

tritnaha commented 4 years ago

If we were to look at the filenames/directory-names it shouldn't be to hard at least for scene-content.

AllInternal.19.11.20.Tania.Swank.XXX.1080p.MP4-KTR ATKGalleria.17.02.27.Cleo.Vixen.Toys.XXX.1080p.MP4-KTR etc etc..

They all follow the same uniform naming standard of Sitename.YY.MM.DD.Title.Goes.Here.XXX.Resolution.Format-RELGRP. But say your content is renamed or doesn't follow the same naming-scheme my guess is that people sort their stuff in folders that would start with the network/sitename anyhow..

WithoutPants commented 4 years ago

I think this can be addressed by the scene filename parser and can probably be closed, unless there is additional functionality required.