pblittle / docker-logstash

Docker image for Logstash 1.4
https://hub.docker.com/r/pblittle/docker-logstash
MIT License
236 stars 90 forks source link

Location of embedded elasticsearch index is non-obvious. #74

Closed griff closed 9 years ago

griff commented 9 years ago

I am using the embedded elasticsearch for now and I wanted to make sure that its data was stored on a volume but finding out where it was stored and why is non-obvious and from what I can gather a bit of a combination of things that might easily break. Basically the data is stored in $(pwd)/data which as the image is now means /data but change workdir on the docker run command and the data dir is moved.

It would be better to have an explicit env variable to control the location, make it a hard-coded location that defaults to a volume or make it a hard-coded location and include documentation for making that directory a volume.

pblittle commented 9 years ago

@griff, I totally agree. At a bare minimum, the current implementation, or lack thereof, should be documented.

I also like the idea of storing the embedded elasticsearch data directory in an environment variable. That would be a quick and easy solution. Maybe /data/elasticsearch.

It might also be a good idea to add VOLUME /data to the Dockerfile. I have been hesitant to make any changes to the Dockerfile. This image is already way too big. If the size increase is negligible, this might be worth pursuing.

Having said all of that, I'm a little backed up building writing #49. The current single config file limitation is getting painful for me and others.

If you have any interest, a PR is more than welcome. Otherwise, it may be a while.

griff commented 9 years ago

If you are looking to slim down the image docker history is a great tool. Tells you exactly which layer is using all the space. Adding a VOLUME line doesn't take any space since it is only a metadata change.

pblittle commented 9 years ago

@griff, @mikehaertl, just a heads up, I have started to work on this issue.

You are now able to define VOLUME and WORKDIR using a DATA_DIR environment variable. The default DATA_DIR is /data.

You can see the direction I'm taking in the feature branch.

pblittle commented 9 years ago

Closing due to inactivity. @griff please reopen if I can do anything to help.