stash integration - Githubissues

This subject straddles both stash and stash-box.

stash-box is primarily intended to integrate with stash, so need to consider how that integration will look.

I see the objectives as follows:

participation should be optional and maintaining the privacy of users is an absolute imperative
further to that, the user should have fine control over what information they push and pull to and from stash-box
barrier for participation should be as low as possible. In the short term, we're not going to have a large, active userbase curating the content.
potential for abuse should be minimised, and correcting said abuse should be easy for moderators

To that end, here are some potential functionality items that we can consider adding to stash:

allow manual pulling (or scraping) singular scene and performer data from stash-box - using the same functionality as the current scraping stuff. This allows the user to subsequently alter the returned data
allow manual pushing of singular scene and performer data to stash-box
automated push/pull/syncing of scene and performer data

Other considerations:

stash-box should focus on objective data. Tags will be a mish-mash of subjective and personal items. We'll need settings in place to scope what tags should be pushed/pulled between systems

I raised #4 for the edit and voting process, envisioning this system to be like the MusicBrainz concept. Since then, I've had doubts about the participation of such a system. A more seamless integration seems like a better alternative, such that users barely even know it's there. This leads to the obvious problem of maintaining good data. These ideas need further discussion and development.

The way I envision stash integration is sort of how the filename parser works. You get a list of scenes without a stash-box-id, you can filter it to your liking, and then fire off a search to stash-box. You'll get back a list of hash matches or fuzzy matches and can verify the list before saving it locally. Alternatively there should be options to "auto-save hash matches". Basically sort of like Picard in that you can fire and forget, or manually approve if (like me) you want that kind of control.

I'm a bit more skeptical of two-way synchronizing since I'm worried it will lead to a lot of duplicates. Most users just want metadata, and many will happily press any button they see that says save. I can see it being useful as an early-adopter kind of thing while we seed the database, but long term I think it'll save us a lot of pain to force users to actually submit each individual scene they want to create, ideally with a small edit note. We can of course seed the form with data from stash and make it as easy as possible to contribute, but there should be some threshold to prevent database pollution.

Regarding editing and voting, I don't think you necessarily need to worry that much about participation. There are plenty enough collectors and data nerds in the world that if the project gets some traction, people will want to contribute. Maybe I'm weird, but personally I'm more interested in the metadata curation aspect than anything else. With that said, voting is probably not the first thing we need to worry about.

The most important aspect, IMO, is going to be a change log that allows for seeing history and reverting changes. We can allow relatively unfettered editing by registered users, at least for the initial period, as long as the changes are trackable and reversible. Where that becomes tricky is destructive changes like deletions and merges. I don't think there's really any way around some kind of approval process for these kinds of operations which are hard to reverse. Many databases handle this with having a limited set of privileged users who can do destructive changes, but personally I think that's very opaque and hard to scale. Wikipedia seem to handle it with a sort of consensus/voting system. Short term privileged users are probably fine, but long term I think some sort of voting system is inevitable. It's also, relatively speaking, pretty simple to implement.

stashapp / metadata-api-discuss

stash integration #8