stashapp / metadata-api-discuss

This repo is the laziest possible way we can have threaded conversations about metadata collection and curation for StashApp
MIT License
6 stars 1 forks source link

stash integration #8

Open WithoutPants opened 4 years ago

WithoutPants commented 4 years ago

This subject straddles both stash and stash-box.

stash-box is primarily intended to integrate with stash, so need to consider how that integration will look.

I see the objectives as follows:

To that end, here are some potential functionality items that we can consider adding to stash:

Other considerations:

I raised #4 for the edit and voting process, envisioning this system to be like the MusicBrainz concept. Since then, I've had doubts about the participation of such a system. A more seamless integration seems like a better alternative, such that users barely even know it's there. This leads to the obvious problem of maintaining good data. These ideas need further discussion and development.

ghost commented 4 years ago

The way I envision stash integration is sort of how the filename parser works. You get a list of scenes without a stash-box-id, you can filter it to your liking, and then fire off a search to stash-box. You'll get back a list of hash matches or fuzzy matches and can verify the list before saving it locally. Alternatively there should be options to "auto-save hash matches". Basically sort of like Picard in that you can fire and forget, or manually approve if (like me) you want that kind of control.

I'm a bit more skeptical of two-way synchronizing since I'm worried it will lead to a lot of duplicates. Most users just want metadata, and many will happily press any button they see that says save. I can see it being useful as an early-adopter kind of thing while we seed the database, but long term I think it'll save us a lot of pain to force users to actually submit each individual scene they want to create, ideally with a small edit note. We can of course seed the form with data from stash and make it as easy as possible to contribute, but there should be some threshold to prevent database pollution.

Regarding editing and voting, I don't think you necessarily need to worry that much about participation. There are plenty enough collectors and data nerds in the world that if the project gets some traction, people will want to contribute. Maybe I'm weird, but personally I'm more interested in the metadata curation aspect than anything else. With that said, voting is probably not the first thing we need to worry about.

The most important aspect, IMO, is going to be a change log that allows for seeing history and reverting changes. We can allow relatively unfettered editing by registered users, at least for the initial period, as long as the changes are trackable and reversible. Where that becomes tricky is destructive changes like deletions and merges. I don't think there's really any way around some kind of approval process for these kinds of operations which are hard to reverse. Many databases handle this with having a limited set of privileged users who can do destructive changes, but personally I think that's very opaque and hard to scale. Wikipedia seem to handle it with a sort of consensus/voting system. Short term privileged users are probably fine, but long term I think some sort of voting system is inevitable. It's also, relatively speaking, pretty simple to implement.