stashapp / stash-box

Stash App's own OpenSource video indexing and Perceptual Hashing MetaData API
MIT License
217 stars 62 forks source link

[RFC] Anonymize Fingerprints for User Privacy #633

Open ChilledSlim opened 1 year ago

ChilledSlim commented 1 year ago

Anonymize Fingerprints for User Privacy

Scope

Currently, StashBox instances (such as StashDB) are a giant database of users and what's in their porn collection. That could be considered sensitive data for many users. Adding some abstraction layer between the UserID and the FingerPrints in the user's collection could limit that, ensuring that the scenes in a user's collection are not easily linked with a user. This maintains a user's privacy, adding a ton of value, and increases the use of the stashbox instance. I don't think there is any debate that what is in someone's stash collection should remain private.

Implementation

Only the user's Stash instance (on the user's computer) would then know that it owns the UUID in question.

Should any third party stashbox instance (such as StashDB) ever get hacked or the data acquired, there would not be a specific identification of users to their porn stash.

Technical

To Stash-Box, this would require an additional field in FingerprintInput and the endpoint to (1) remove the user_id and replace it with a stashapp generated UUID, and (2) to accept a UUID for fingerprint submission and, when present, not save the user_id.

On the StashApp side, this would require stash to (1) have a migration to replace user_id with a UUID in the configured Stashbox instance, (2) track the UUID used to submit each scene, and (3) submit new scenes with a generated UUID.

Notes

Flashy78 commented 1 year ago

One simple workaround for users is to sign up with a unique email address/username. That way no matter what, if your Stashbox account gets leaked, there is nothing tying the username/email back to you.

ChilledSlim commented 1 year ago

Of course. Hence also #632 .

But that also made me think that there is no reason why this data shouldn’t be anonymous. Would put a bounty on this frankly if the implementation can make sense. There is a line between "I have a porn stash" and "here is what's in my porn stash".

aghoulcoder commented 1 year ago

There is no real need for this as long as users can sign-up pseudonymously and create accounts freely.

In fact, that's exactly how big crowd-sourced projects like Wikipedia and OpenStreetMap work. In OpenStreetMap in particular, you are expected to be mapping in your local area and places you physically visit. That information is quite personal, but it doesn't matter because OSM never needs to know anything more about you besides a username and an e-mail address.

Keep it simple.