research-software-reactor / code-testing-for-researchers

Supporting in-cloud testing for researchers
Apache License 2.0
2 stars 0 forks source link

Learning path: Containerising, and associated resource considerations #7

Open tmbgreaves opened 5 years ago

tmbgreaves commented 5 years ago

Research software testing in containers tends to generate large amounts of container data. a combination of relying on a lot of supporting software, the core research software repository often containing large testing files, and tests themselves generating large output files. As an example of this, the current Fluidity release container is around 1GB in size; Devito containers are similarly sized. In the Fluidity case, the container size increases to about 4GB once tests have run and the output is archived inside the container. For comparison, httpd, a standard server container has a 132MB container size; a larger example would be Joomla at 415MB.

Where this becomes a problem is in the case where testing is split up into a 'create the container' job followed by a series of 'test using the container' jobs. If it's not guaranteed that the same testing system is used for both jobs, the container will need to be pushed out to a registry such as the Azure Container Registry (ACR) once it's been created, then pulled back to the system which it's going to be used on to run tests. This may seem like a good option to save using five- to ten-fold worker time on mutiple container builds, per test configuration. At a container build time of 20 minutes and a worker cost of £0.1 per hour for a D2s instance, this could save on the order of £0.50 per build.

One issue arising from this where the container registry is outside the cloud that hosts the testing, thus incurring bandwidth costs to push containers. Taking an example of devito testing where built containers are around 1GB in size and on the order of parallel testing runs occur per commit in the case of a branch that has a pull request on it, the job could generate around 10GB of outgoing traffic. With on the order of 10-15 builds per day, and bandwidth costs at £0.06 per GB, costs on the order of £10 per day might be incurred -- more than the cost of extra worker time to build the containers from scratch. The real cost may well be higher still as external data transfer will take time and the worker VM has per-unit-time costs to consider during the transfer process. This is probably not a problem for a well funded commercial project but could be significant for a research project running on a tight monetary budget.

A better way to handle container data is to keep it in-cloud where costs are an order of magnitude lower. In the case of Azure Container Registries (ACRs) there are no in-cloud data transfer costs, but there are storage charges (dockerhub doesn't charge for storage). Fluidity, for example, has amassed on the order of 400GB data in the testing output repository on dockerhub. The same data in ACR would incur costs of around £1.50 per day.

For researchers unused to outbound data transfer charges and container storage charges these might well not be obvious incurred costs and running through some basic examples for learning could be useful. It could also be useful to discuss the option of not having any container transfer costs by running the whole job from container build to testing on one worker node, and assuming that any debugging will need the container to be rebuilt and retested on the debug system as opposed to pulled from a repository.

ggorman commented 5 years ago

How about giving one or more examples of open source software where this is a problem?

tmbgreaves commented 5 years ago

How about giving one or more examples of open source software where this is a problem?

Thoroughly overhauled with a first-pass set of examples. These probably need more referencing and expanding. I've drafted this initially without consideration for making it accessable to the non-technical, to get the information out of my head and into the issue, but am not intending to leave it that way! Work in progress.