Currently import is just a matter of creating a plan and see if the plan would be obstructed in any way by existing files. Which usually is a no. Obstructions renames the incoming files that are obstructed and abort the import.
Thinking we should promote this obstruction detection into its own category of actions. CV can happen as a similar "thing" where we redefine obstruction as both a binary hash conflict and a perceptual hash clash. Only difference being the conflict resolution strategies are that with perceptual clashes, it's valid to ignore the clash.
Only issue with this approach is that it doesn't have a retroactive element. People would have some collections built already, perhaps with similar files that should be grouped, a story for them may be required.
Directory structure based VP-Tree. Seems problematic considering how much disk thrashing it may introduce when we anneal the tree after an insertion.
The implementation of PHash we're using is 64 bits and can easily fit into main memory for up to 1M files. An in memory metric-tree can work much faster than a directory backed one but would require a full collection attribute scan or caching the list in a file somewhere. Though we can scan about 1K files per second so I don't think this is a lot to ask.
Currently import is just a matter of creating a plan and see if the plan would be obstructed in any way by existing files. Which usually is a no. Obstructions renames the incoming files that are obstructed and abort the import.
Thinking we should promote this obstruction detection into its own category of actions. CV can happen as a similar "thing" where we redefine obstruction as both a binary hash conflict and a perceptual hash clash. Only difference being the conflict resolution strategies are that with perceptual clashes, it's valid to ignore the clash.
Only issue with this approach is that it doesn't have a retroactive element. People would have some collections built already, perhaps with similar files that should be grouped, a story for them may be required.