manticoresoftware / docker

Official docker for Manticore Search
67 stars 19 forks source link

Store Index Data in Container not Volume #2

Closed josephxsxn closed 5 years ago

josephxsxn commented 6 years ago

Hi,

We are looking at using manticore for its PQ featureset. For our usecase we would want to be able to setup an image that has a number of PQ queries already inserted into it, and version this image so it can be portable. Currently the use of volumes (while it makes sense for traditional search) to store data prevent the containers from being fully portable in a container engine like Nomad or K8s. Because the total number of pq queries is much less total storage the traditional indexing we would like to save them with the containers. Then should we require horizontal scaling of the manticore we can simply spawn more containers anywhere without worrying if the volumes need to be replicated to the target host as well.

Is there a recommend way to modify the Dockerfile to support this? I already have found that if I rebuild the image with a custom sphinx.conf I am able to keep my pq index created, but for some reason it never keeps the queries even after a docker commit.

adriannuta commented 6 years ago

The percolate index is a RealTime index type. New content (in your case queries) are stored in RAM and after a while (rt_flush_period) they are flushed to disk or if the rt_mem_limit is reached data gets dumped to disk as a chunk.

When you do the docker commit I suspect the queries are not yet on disk, so they are not captured.

Run FLUSH RTINDEX pqindex; to force flushing the RAM data to disk before doing the docker commit.

josephxsxn commented 6 years ago

I have attempted to run the FLUSH RT pq; as you suggested and it still appears that docker commit -p $currentContainerID mantitest:latest does not save the queries. After alot of thought we have decided to just write some code to populate the image when it starts from another datastore so this wont be an issue. But I still do fine it odd that I am unable to commit inserted queries.

The apache:nifi image is a good example of a container you can perform a commit on and be able to redeploy the changed container.

There is no longer a need to solve this, feel free to close it if you want.

klirichek commented 6 years ago

Hi. Providing the volume/bind/tmpfs mount point is not the requirement (you know). If you'll not provide anything, it will just write into mount point folder/file inside container (into ephemeral image).

So, naive obvious way to achieve the thing is to initially run the docker without data volume; then configure PQ inside (modify config; run daemon; insert queries, anything else which necessary to turn it into working state). Then shut down daemon inside, stop the container and commit it's image as new docker.

I believe all these things may be scripted and appended to the new dockerfile which can start from manticore and generate new customized docker with PQs inside. And you can run as many copies of such docker as you want (since each of them will, in turn, run in own etheperal image based on set-up docker).