sort by similarity? - Githubissues

hydrusnetwork / hydrus

A personal booru-style media tagger that can import files and tags from your hard drive and popular websites. Content can be shared with other users via user-run servers.

http://hydrusnetwork.github.io/hydrus/

Other

2.33k stars 152 forks source link

sort by similarity? #88

Open CuddleBear92 opened 8 years ago

CuddleBear92 commented 8 years ago

sort by similarity? not sure if this can even be done. would be nice to have a quick way to sort similar images and so on when they don't have a title/issue/page and so on to sort after. yes you can argue that you can sort by filename. but that is only when you actually have a file name to search after, doesnt help that files on boorus doesnt contain filenames either and you cant import folders with the filename namespace either.

CuddleBear92 commented 4 years ago

Re-opening as this issue still stands. Cleaning up comments.

bbappserver commented 4 years ago

@CuddleBear92 Similarity is not a sortable value because a ~= b == b ~= a. Did you mean group by? And if so what is wrong with the duplicates system at present for this?

CuddleBear92 commented 4 years ago

No, sort. I still stand by my word since 2015. Sort orders by dupe id/group. That way it can more easily be sorted with dupes next to eachother in a normal gallery view.

Its a different usecase than the dupe tab itself.

bbappserver commented 4 years ago

Well similarity is a metric between two images not a metric on each image that can be used to compare A and B, so you would have to sort on similarity relative to a particular image.

It sounds like what you want is for already duplicate processed groups where it was determined that a king of those groups should be next to its subordinates, and that's fine unless you have a weird edge case where one of its subordinates is also king of a different group, in which case there goes your ordering out the window.

DonaldTsang commented 4 years ago

Imagine applying this to things that have no standard sizes like music, videos, or even ebooks... but it can be done with hashes and summarization.

For Books and Texts:

https://github.com/boudinfl/pke (getting keywords out, for ease of search)
summarization, for previews:
https://github.com/icoxfog417/awesome-text-summarization (summary of information)
Problem 1: how would one display the text with the UI?
Problem 2: where would you find the dataset?
Problem 3: should this be delegated to add-ons? How would it work?