stashapp / stash-box

Stash App's own OpenSource video indexing and Perceptual Hashing MetaData API
MIT License
210 stars 61 forks source link

[RFC] Release Groups / Scene Groups #663

Open AdultSun opened 1 year ago

AdultSun commented 1 year ago

After a brief discussion on Discord, I asked for help writing an issue addressing the long requested "release groups" feature, primarily inspired by the system implemented by MusicBrainz to track multiple releases of the same album. The following is written by StashDB user 2itno. I have only made a few formatting changes to fit GitHub's flavor of Markdown. - AdultSun

SceneGroup object to group rereleases, remasters, redistributions, etc.

Motivation

It is not uncommon for studios to release a scene on different sub-sites, or to re-release a scene at a later date, possibly remastered or edited in some way. While ultimately derived from a single collection of video footage, each release or distribution of a scene may comprise a distinct set of metadata: different versions of a scene may have different titles, release dates, durations, descriptions, studio codes, studio URLs, or studio attributions.

All these metadata are potential candidates for submission to StashDB. How can they be captured in the stash-box data model? Currently, there are two strategies for tracking different releases or distributions of a scene in StashDB, each with significant drawbacks.

One strategy is to try to lump the different distributions into a single Scene entry. For instance, The Score Group will often release a scene simultaneously on a specific themed sub-site as well as on an aggregator site. In many cases, scenes released in this way differ only by studio URL and studio code. Editors sometimes attach both URLs to a single Scene, but a single Scene cannot capture two studio codes. In other cases, however, simultaneous or staggered releases differ in more meaningful fields like title, description, etc. A single Scene has only one title field, one details field, etc.

The other strategy in use is to submit multiple Scenes. Each collection of metadata associated with a specific release event can be fully captured in this way, but there is no link between the different versions of a Scene, no way to capture in the database that the different versions are related.

As a potential new solution, alternate titles have been proposed (issue #345). However, for multiple versions (redistributions, re-releases, etc.) of a scene, more than one field (e.g. date, title, studio, or description) in a Scene's metadata is likely to be different. In order to track all metadata for every release event, a Scene would then need "alternates" of many other fields. It might not be clear that alternate title # 2 is intended to correspond to alternate date # 2, and one can imagine other such UI/UX or backend issues in trying to capture multiple versions of a Scene in this manner. A saner solution is to introduce a SceneGroup object that collects multiple Scene objects, each of which represents a unique release event for a particular studio. (There could be a conceivable use for "alternate" or arrayed fields if, say, a single release event had multiple titles, covers, descriptions, etc. on a studio page. Covers are more likely to be best served by a dictionary object, but this is already out of scope of this document.)

Tags and relationships

In stash-box (stashdb) currently, if a user enters a new version of a scene as a new Scene object, the user may add one of three tags: remaster, redistribution, or re-release. As already noted these tags cannot refer to any Scene, they are just labels. These tags can remain in a Scene / SceneGroup world. One can imagine encoding relationships between scenes, for instance "<Scene uuid#1> is a remaster of <Scene uuid#2>". (Browse MusicBrainz to find extensive use of relationships.) In general could also be useful for describing performer actions in a scene, a Scene could have an attached relationship like "<Performer uuid#1> performed <Action uuid#2>" where Actions could be derived from a subset of current tags. Also, "<Performer uuid#1> performed <Action uuid#2> on <Performer uuid#3>," but this is a digression. A first implementation of a Scene / SceneGroup architecture can simply retain current tagging mechanisms for each Scene object. Asking what fields Scene and SceneGroup each should have is the next topic.

Metadata fields

What metadata should be attached to SceneGroup? Would any of a Scene's metadata subsumed into SceneGroup? MusicBrainz's ReleaseGroup contains Artist information, as various Releases may include or omit some performers (a soprano in an opera, e.g.) at the Release level, or a Release may include tracks (and thus Artists) that another Release in the same ReleaseGroup may not. Would a SceneGroup ever have a set of performers that differs from any of the grouped Scenes? Perhaps if a cameo were credited in one Scene, but not in another? No answer is provided to this question, it can be resolved by the community or developers. Similar questions can be asked regarding Tags or Images on a SceneGroup. Would SceneGroup have separately curated lists of Tags and Images? Would these be derived fields? Again, these are left as an open questions.

It may be useful to choose a single canonical title for a SceneGroup.

SceneGroup
  title
  performers[]
  tags[]
  images[]
  scenes[]

Scene 
  <all existing Scene fields>
  scene_group

Specific example cases

Edge cases

Suppose a Scene title is changed. In a Scene / SceneGroup architecture, would the Scene's title get replaced? Would it get added to a list of alternates? If it were replaced, would the old title be available still, as in a versioned set of Scene objects? Would a new Scene be created in the same group?

In some networks, scenes may listed both on a parent studio and a sub-studio. In some cases, a Scene may have a unique link on a parent studio and a sub-studio, but all other information is identical. For example, this scene has two attached links, one for the parent studio VIP 4K and another for the sub-studio Tutor 4K. All scenes in the VIP 4K network appear to be like this (though this has not been proven).

This is also generally the case on Porn Mega Load. Viewing the front page, you will find each scene listing to have a label like "From: ". A few test cases seem to show that the metadata on the PornMegaLoad is identical to the matching scene on the named sub-studio, but again this is not exhaustively verified.

Should the parent site and the sub-studio each have their own Scene in the SceneGroup, or should there be one Scene only (assuming identical metadata) for same-network scenes with identical release dates and metadata? Probably the former is the safer and more general solution, but this can be a matter of policy to be decided by StashDB community -- the decision would probably not affect the Scene and SceneGroup object design, just guidelines for their usage.

UI/UX changes

AdultSun commented 1 year ago

(Writing as myself again...)

2itno acknowledges on Discord that the UI design still needs more consideration. I imagine we would want to hide alternate releases while navigating Stash-Box's UI by default, showing only the release group's "canonical" info until a user drills down to see the scenes filed underneath. Plex has similar functionality with their "collections" feature, letting users group movies together then decide whether these collections should be shown alongside its contents or instead of its contents. A big motivator for release groups on StashDB in the first place has been to hide the duplicates our guidelines currently allow. In fact, the promise of release group support is the only reason our guidelines allow duplicate releases at all.

We also still need a better idea of how these proposed changes will interact with Stash, including what changes or new features will be needed on that side of the equation. For example, do we want a dedicated "release groups" feature for Stash as well or is this feature better suited just for Stash-Box? Relating to that question is Scruffy's proposed solution that expands the Movies feature in Stash to be more flexible, covering a wide variety of possible groupings of different objects and relationships.

scruffynerf commented 1 year ago

Reminding myself to comment later, and be sure this covers alternate usages like Movies/DVD (collections of scenes), Seasons/Episodes (groupings, and potential groups of groups), and the like.

magicswarmingswarm commented 11 months ago

Any update on this? Think this will be a great feature

DogmaDragon commented 11 months ago

@magicswarmingswarm issues that are being worked on or nearing release usually gets added to milestones. So while this is something that is highly requested it's still in discussion phase.

Pompey69 commented 11 months ago

Have to say that this a huge feature that is missing. Any chanceI to setup a secondary "stash split scene db" ? So that the scraper will use that DB for split scenes that are user added to the database? Even if it's not great at first, at least it will bring up a ton of issues that can be addressed in real time testing of it.

Ronnie711 commented 8 months ago

Agree with Scruffy's take that there's a need for Collections (Scene grouping for multiple releases, series etc) and then for Movies - I'd have the main metadata for the full movie with then a sub group underneath which links to the individual scenes (and obviously a link back from the scene page.)

Trying to find a way to make this flow without endless duplicates cluttering up on top of everything else is the difficult part ...

EDIT: Being able to nuke incorrect fingerprints needs to be implemented and fully working before any of this goes live

laurus-lx commented 4 months ago

Perhaps this feature can be aligned with StashApp File Design Draft, so both projects develop in-step: https://github.com/stashapp/stash/discussions/1958

Here's a proposed schema modified from the above link slightly reworked for StashBox (removed files and folders and added scene relations). In the schema below - note that relations Scene/Movie, Movie/TimeRange, and Scene/TimeRange are many-to-many (would need a join table for each, not shown) image

and PlantUML Code

@startuml

class Fingerprint {
  hashType: string
  hash: bin
}

class TimeRange {
  startTime: float
  endTime: float
}

class Scene
class Movie

class SceneRelations{
    relationType: string
}

Scene "1" <-- "1" SceneRelations: scene
Scene "1" --> "1" SceneRelations: scene

Scene "0..n" *--> "0..n" Movie: scenes
Scene "0..n" *--> "0..n" TimeRange: time-ranges
Movie "0..n" *--> "0..n" TimeRange: time-ranges
TimeRange "0..n" --> "1" Fingerprint: file-hash
@enduml

This should be fairly straightforward to implement from the DB prospective, but User Interface, Approvals and interface with StashApp would need good amount of work and pre-planning/design