marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
https://www.marqo.ai/
Apache License 2.0
4.57k stars 188 forks source link

[ENHANCEMENT] disk persistent storage for indexes and documents #296

Closed achao2013 closed 1 year ago

achao2013 commented 1 year ago

Is your feature request related to a problem? Please describe. The document is cached in docker.If docker is down, you need to run the database again. Can we support disk storage? Describe the solution you'd like provide a config which can set database storage path in the disk which has maped into docker.

Describe alternatives you've considered

Additional context

achao2013 commented 1 year ago

@wanliAlex @pandu-k

pandu-k commented 1 year ago

Disk storage is supported by default the form of a Docker Volume.

The quick start guide commands are to run a Marqo instance for the first time. If you want to access the same Marqo instance after restarting your computer, follow the starting and stopping guide in Marqo: https://docs.marqo.ai/0.0.12/starting_and_stopping/

If there are persistence issues because of your cloud computing environment (for example, if you are using SageMaker), you can change the Docker storage location: https://docs.marqo.ai/0.0.12/Advanced-Usage/change_storage_location/

pandu-k commented 1 year ago

Also, if you want to transfer Marqo's state to a new Marqo container (for example, a version update), follow this guide: https://docs.marqo.ai/0.0.12/Advanced-Usage/transferring_state/

achao2013 commented 1 year ago

can i config the disk storage path? I mean the disk storage path of document, not docker itself @pandu-k

achao2013 commented 1 year ago

in other words, docker contaner store in alternative places(e.g. /var/lib/docker), the text or image codes(so-called document) store in fixed disk storage(e.g. /mnt/disk1).

achao2013 commented 1 year ago

@pandu-k @wanliAlex

jn2clark commented 1 year ago

hi @achao2013 , would you be able to provide some more details? To change the docker storage location you can use this https://docs.marqo.ai/0.0.12/Advanced-Usage/change_storage_location/ . The images can live in another location and only the corresponding embeddings will be stored in marqo-os. For text, the original will be stored within marqo-os along with the embeddings. To summarise, pointers to images can be used but at the moment the original text will be also stored and pointer only for text is not supported. Does that help answer?

achao2013 commented 1 year ago

hi @achao2013 , would you be able to provide some more details? To change the docker storage location you can use this https://docs.marqo.ai/0.0.12/Advanced-Usage/change_storage_location/ . The images can live in another location and only the corresponding embeddings will be stored in marqo-os. For text, the original will be stored within marqo-os along with the embeddings. To summarise, pointers to images can be used but at the moment the original text will be also stored and pointer only for text is not supported. Does that help answer?

If i want to store the image or text embeddings in the disk, not the marqo-os in docker, does the design of marqo support it or how can i edit the marqo code to implement this function? @jn2clark

achao2013 commented 1 year ago

@pandu-k @wanliAlex

jn2clark commented 1 year ago

you can run the backend (opensearch) outside of the marqo docker. this means the opensearch volume can persist without the marqo docker. see the developer guide here https://github.com/marqo-ai/marqo/tree/mainline/src/marqo. option C is what you want. just make sure that opensearch is started first

achao2013 commented 1 year ago

you can run the backend (opensearch) outside of the marqo docker. this means the opensearch volume can persist without the marqo docker. see the developer guide here https://github.com/marqo-ai/marqo/tree/mainline/src/marqo. option C is what you want. just make sure that opensearch is started first

thanks, It's getting close to what I want. Further down the line, is there a specific location in the code where you can set the disk storage path of the opensearch volume. @jn2clark

achao2013 commented 1 year ago

by the way , what's the differene between "marqoai/marqo-os:0.0.3" and builded marqo_docker_0 in option C. @jn2clark @pandu-k

pandu-k commented 1 year ago

by the way , what's the differene between "marqoai/marqo-os:0.0.3" and builded marqo_docker_0 in option C. @jn2clark @pandu-k

marqoai/marqo-os:0.0.3 is the version of OpenSearch that Marqo uses. In the future we plan to have some sort of "dump index" functionality. Would this be help solve this prolem?

achao2013 commented 1 year ago

if it's a configurable disk path, i think it works.