Open matthew-z opened 5 years ago
Hey Matthew, I proposed this option in the beginning whenever we started the design of the jig. In the end @lintool proposed to save the index in the image within the docker to reduce the loading times. I implemented what you just proposed for the training jig instead, which saves the data to an external file and allows the sharing of the trained models between images.
In any case you could save the index from one image to your host machine, then load the index data again if you wanted to.
I was thinking to do it similarly. A good way would be to add one flag to pass a directory to be mounted as a volume for data storage, just like the /input mount - did you do that or just hardcode it @albpurpura?
I see, we can use the model_folder
to mount any data to docker with train
hook.
Then, I think it will be great to add a similar arg to other hooks for mounting data from host machine.
@arjenpdevries I did it exactly as you said. The folder to mount is passed as an argument, have a look here https://github.com/osirrc/jig/blob/master/trainer.py
In the end @lintool proposed to save the index in the image within the docker to reduce the loading times.
Correct. This is a tradeoff between jig complexity (one more thing the jig needs to manage) vs. image efficiency (having to rebuild the index each time). At the start, we opted to simplify the jig since we were just getting started. However, now that things are working, I'm happy to revisit for v2.
@matthew-z I had some scripts that allowed to update the scripts in an already existing image. See https://github.com/osirrc/terrier-docker/blob/master/dev/bumpContainer.sh
@cmacdonald Great! Thank you!
It seems that jig will perform index and commit it to a new image. If my understanding is correct, after modifying the source code and building a new docker, we also have to re-index to create a new image. I wonder how to avoid it.
I think the most straightforward way is that the index is a directory of the host machine, and it will be mounted into the docker container when we launch it. Thus, even the image is destroyed or outdated, we can still mount the index directory to a new docker container.