yurijmikhalevich / rclip

AI-Powered Command-Line Photo Search Tool
MIT License
738 stars 57 forks source link

feat: add an option to store rclip DB alongside of the indexed images #36

Open josalhor opened 1 year ago

josalhor commented 1 year ago

Hello!

I have noticed that rclip saves the database based on an environment variable or a system data directory: https://github.com/yurijmikhalevich/rclip/blob/7ea2eedc743de98e98b36456537034030997b35d/rclip/utils.py#L37

Is there any reason why we can't configure rclip to save the database on the current directory ala git? If I am not mistaken, that would allow users to move/copy these folders and keep a database of the cached images. This would be really useful if someone managed their albums through a cloud platform, allowing them to also upload the database file and share it across devices.

yurijmikhalevich commented 1 year ago

Hi @josalhor! I like your suggestion.

Right now, you can configure this by adding an environment variable to your .bashrc or .zshrc on the machines you are using rclip on, like:

export RCLIP_DATADIR=<path to your photos>/.rclip

If we go this way by default, we will need to either search parent dirs for the database if you perform a query in a subdir or to keep lots of smaller indexes in every subdir. Having multiple smaller indexes will negatively impact the query time when you search in a parent directory. Maybe, going the first way will make more sense, given performance implications 🤔

josalhor commented 1 year ago

Hi @josalhor! I like your suggestion.

Right now, you can configure this by adding an environment variable to your .bashrc or .zshrc on the machines you are using rclip on, like:

export RCLIP_DATADIR=<path to your photos>/.rclip

This would work for a single directory, but if you have multiple disjoint directories that would not work! The alternative would be to make the path to your photos point to your CWD, that may do the trick but I am not sure it is the most elegant solution.

If we go this way by default, we will need to either search parent dirs for the database if you perform a query in a subdir or to keep lots of smaller indexes in every subdir. Having multiple smaller indexes will negatively impact the query time when you search in a parent directory. Maybe, going the first way will make more sense, given performance implications thinking

I didn't actually imagine this in a recursive fashion with nested directories in mind. I was thinking purely about the idea of having multiple disjoint albums that you want to query and share separately. I wonder if we would need to implement to some kind of rclip init to establish an empty database in the current directory (again, ala git). Otherwise it is not clear to me how you would clearly communicate to the user the database directory

yurijmikhalevich commented 1 year ago

@josalhor, rclip init is the first thing that comes to mind, but it will introduce additional friction to using rclip which I would love to avoid. I will think about the performant way to implement the recursive index.