osirrc / terrier-docker

OSIRRC Docker Image for Terrier
http://terrier.org/
2 stars 1 forks source link

is the memory configuration right for large corpora #23

Closed cmacdonald closed 5 years ago

cmacdonald commented 5 years ago

how much of the container can Terrier use?

lintool commented 5 years ago

Currently, @r-clancy is running on Azure Standard_D64s_v3, which has 256g: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general

Other than a bit of reserve for running the docker engine itself, the image can use the rest of it...

cmacdonald commented 5 years ago

I think my question is how to configure Java to know that. Its defaults are rather low (like 1/64th of the machine's memory)

ArthurCamara commented 5 years ago

Can't we just use TERRIER_HEAP_MEM=XXX?

cmacdonald commented 5 years ago

Of course, but how to set XXX for different docker containers. E.g. the python could examine psutil?

ArthurCamara commented 5 years ago

I believe so: https://github.com/giampaolo/psutil I will create a PR on this.

ArthurCamara commented 5 years ago

Turns out, it's easy to get the available memory, but we can't set a persistent environment variable using Python, so subprocess.run(f'TERRIER_HEAP_MEM={memory_to_use}'.split(), shell=True) won't work. I'm looking for workarounds.

cmacdonald commented 5 years ago

you can put to os.environ

ArthurCamara commented 5 years ago

Yes, but it won't survive after the python script is killed. I don't think the java subprocess will be able to access it

ArthurCamara commented 5 years ago

Found a workaround setting the env argument to subprocess.run()

ArthurCamara commented 5 years ago

https://github.com/osirrc/terrier-docker/pull/26 I've added the special environment to (almost) every subprocess.run(). It's probably overkill, but should work.