hydrusnetwork / hydrus

A personal booru-style media tagger that can import files and tags from your hard drive and popular websites. Content can be shared with other users via user-run servers.
http://hydrusnetwork.github.io/hydrus/
Other
2.41k stars 158 forks source link

Tag search and editing #585

Open Zweibach opened 4 years ago

Zweibach commented 4 years ago

Tag search and editing

A panel to manage tags themselves rather than tags on files. Suggested usecases are editing tags to correct spelling errors, plain deleting tags without having to search up the files with it and remove, and dealing with groups of tags by searching for them in some manner. Last one might be more useful when we have tag meta data to actually do searches on.

Functionally (aka behind the scenes) it could just work like it currently does and how we recommend people deal with it. That is, find files, apply correct tag, delete incorrect tag for spelling errors. Or siblings, but those aren't always appropriate.

bbappserver commented 4 years ago

Disclaimer:The following has no bearing on using this feature locally, only on its interaction with the PTR service.

It is included here since I was asked to illustrate potential hurdles of this feature's implementation from a technical perspective.

@Mengmoshu @JavertTheArcanine

@bbappserver It might interest you to know that this is an often requested feature. And if you're going to cite "technical reason" you really should spell them out. GitHub issues are very much an appropriate place for technical details, even from people who aren't developers if they're at least familiar with the code base.

Furthermore, a tag and namespace renaming system could reduce the problems caused on the PTR if they are a new kind of commit on the PTR, thus allowing janitors to much more clearly judge the suitability of the change.

And I'm pretty sure we've mentioned to you before that the presence of a workaround shouldn't be used as an argument against a feature.

Problem 1: Tag names are (currently) also their universally unique identifiers

Background

In databases we have a concept of keys, a key may locally uniquely identify an object at various levels:

When working with a distributed system like the PRT we can't trust the primary key which is just an auto incrementing integer to be the same across hosts

e.g. Let the PTR tag log be ('cat','dog')

The clients are desynchronized

Consequently we can't possibly use an integer to assign these tags in messages we pass two and from the PTR server or for that matter any other host, because they do not map onto the same items.

Question: Is there a universally unique identifier we can use?
Answer:How about the name of the tag itself, that is consistent across clients regardless of integer, great so as long as the name of a tag is fixed after it is created there is not problem (so long as the system uses this as the UUID).

Unless that is someone was allowed to rename tags in the current system, then that would be pretty bad.

Illustrating a potential problem

Suppose A and B are clients

Let a and b have tag spices:shadow the hedgehog

B submitted his update after A so it will be applied second.

The PTR generates an update chunk

(RENAME,`spices:shadow the hedgehog`,`species:shadow the hedgehog`)
(DELETE, `spices:shadow the hedgehog`, [...a few thousand hashes])

B's work is ignored because the tag now has a different UUID. This will be the case for anyone who manipulates this tag before they are up to date with A's fairly new tag. I believe this is technically an instance of the lost update problem, except that in the case of multiple hosts just sending whatever they want whenever they want you can't really make a transaciton.

https://codingsight.com/the-lost-update-problem-in-concurrent-transactions/

The headache of solving the problem

One possibility is to stop using the name as the UUID, but instead assign a secure random string as the alternate key. This means every single tag record on every single client needs to be updated to have an identical UUID for every tag. This will probably roughly 1.25x the size of definition space, and involves a rework of how the PTR message system handles a bunch of commands, all siblings, parents petitions, pends, adds and removes now all have to be rewritten to use this key.

This solves the problem of having an alternate key that is an immutable identity of the tag, while allowing its label to be changed.

bbappserver commented 4 years ago

Problem 2 : Renames are not bijective

A sibling is undoable, a rename is not. Most people have experienced this aggravation when doing a bulk find and replace. It is essentially the same problem as a poorly informed sibling.

Suppose someone on their client Has

archer (fate servant)
character:shirou emiya

Decides archer (fate servant) should be renamed to character:shirou emiya, and this is accepted ( not that that would necessarily ever happen for this particular illustration example, but maybe for a more niche property where a janitor did a quick google search and it seemed legit the standard for whatever reason it is accepted).

Now you have essentially created a sibling, but there is no way to back out of this once you have published a subsequent update. Since you have not virtually but literally renamed all archers to shirou emiya, you can't recover the names (preimage) of the archers before they mass rename. You might be able to calculate the reverse delta from the problem in the history and merge it into a future update, but now the PTR server needs to learn how to do merges. This is also a 2 second problem to undo if you just use the comparable functionality of a sibling, even though that is clearly slightly less convenient especially for bulk jobs.

Zweibach commented 4 years ago

Simple solution: Don't allow it on PTR/remote repos as I've suggested to dev for Tag Migration. Poof!

Mengmoshu commented 4 years ago

Not being bijective (Problem 2) is definitely a problem for the PTR (or potentially any other shared repository). The closest "solution" for it I can think of is to allow servers to disable generating them, or to limit the functionality to privileged users. Preferably the client would not show the UI for renaming or would show just a "disabled on this service" message in this case.

Problem 1 is definitely much thornier, and the "easy" solution is just a more heavy handed version of my solution for Problem 2: disable it on repository services entirely. Locking it to local tag services only would turn the UUID problem into an issue of brute processing as under the hood it could just be automation of the manual process we have now. It also "solves" the eventual consistency part of the problem, because there is only the one client.

Making the feature available on Tag Repositories does look like it would require some fairly radical changes. The "smallest" I can think of would be some sort of UUID coordination. For example clients could use a temporary UUID locally (or none?) when they don't find a tag in the repo, send a UUID free commit to the server, the server generates UUIDs for new tags and includes that UUID<>Name mapping in update files. This doesn't solve the bijectiveness issue, and leaves eventual consistency for renames a thorny problem.


After thinking about this a few more minutes I think the practical/easy solution is a special type of commit (a rename petition?) that wraps a mass deletion and creation. It would only replace tags the generating client knows about so other clients might generate redundant or conflicting changes, but the system we already have deals with those. Restoring tag undelete functionality seems like a prerequisite for this. Repository Janitors should hold rename petitions to a very high standard of description as well. Another drawback is that the old tag can still be added, leaving the PTR with both tags until someone generates another rename petition. This idea would also be safest if limited to a 1:1 replacement.

The other possibility is that expansion of the sibling system could allow all the desired results of this feature with few or none of the problems. Namespace Siblings ( #589 ) would allow fixes like spices: -> species:. Clientside control of which sibling displays will also help ( #695 ), since hiding the "bad" side of a sibling pair intended to fix a typo will at least provide the illusion of a rename. The existing virtualization of siblings also solves the bijective issue as far as I know.


With all that said, it's possible that Hydev or someone else will see an approach to the problem we want to solve with renaming that nobody else has yet.

Zweibach commented 4 years ago

It's worth pointing out that dev has stated that he'll create a feature for janitors to create proper tag replace "siblings" server-side in his upcoming network job.

BlubParadox commented 2 years ago

Since my suggestion was considered a duplicate of this issue, i'm going to throw my own hat into the ring on this suggestion.

(I am talking as if this is a client side only deal, and at no point will any suggestions made be allowed within PTRs or any other form of multiple user database)

I agree with the suggestion of having a tag management page, along with the other suggestions made along with it. However the way the suggestion was worded was certain ideas such as renaming tags or deleting tags would be something that you do once and it only applies to the files that have it at the moment of renaming or deletion.

I feel this is ill advised, and you should also have the ability for it to act on certain actions as new files come in from subscriptions.

If I delete the tag redundant tag and then a new file comes in with redundant tag, it should remove it, or if i make it so all tags named jerma become jerma985 that should apply to all new files that enter with jerma.

I would like it to act like tag siblings, where you set it up and as new files in, it applies. However, my original suggestion was to have the ability to outright replace tags, with a system of adding the new tag, and then deleting the old one whenever a file with that tag entered.

It seems possible, and if a tag management panel or page becomes a thing, I think it should be a part of it.