Open iosusan opened 10 years ago
Is there any actual procedure that I can use to modify images? In particular, I am interested at openmpi; I have to install some scientific libraries that must be common to all the compute nodes.
+1
particularly with MPI programs, you tend to need custom libs for each non-trivial application. Conceptually, for the MPI world, I think we need two layers: 1- a base layer with images that create the MPI fabric. OpenMPI is one, but MVAPICH is better when the networking is IB. 2- individual, application specific images that can create a parallel instance of the application
The base layer needs to progress with the MPI implementation bug fixes, and get augmented with the ability to specialize the images for MPI applications. There is bound to be some common steps that we can automate as we learn how to deploy MPI applications through containers.
The application images need to progress with the application features and bug fixes.
There isn't a very good way to modify the backend images (storage or compute) right now. Conceptually it is pretty simple:
The trick will be notifying Ferry how to configure this new backend. Right now it uses hard-coded logic; if Ferry sees that you've configured "openmpi" it fetches the Open MPI configurator (which in turn generates the necessary configuration files). I am thinking of something like this:
backend:
So basically users can provide an optional "image" parameter. Ferry will still use the "personality" parameter to figure out how to configure the service, but will instantiate the customized image. Thoughts?
On Wed, Oct 22, 2014 at 9:21 AM, Theodore Omtzigt notifications@github.com wrote:
+1
particularly with MPI programs, you tend to need custom libs for each non-trivial application. Conceptually, for the MPI world, I think we need two layers: 1- a base layer with images that create the MPI fabric. OpenMPI is one, but MVAPICH is better when the networking is IB. 2- individual, application specific images that can create a parallel instance of the application
The base layer needs to progress with the MPI implementation bug fixes, and get augmented with the ability to specialize the images for MPI applications. There is bound to be some common steps that we can automate as we learn how to deploy MPI applications through containers.
The application images need to progress with the application features and bug fixes.
— Reply to this email directly or view it on GitHub https://github.com/opencore/ferry/issues/9#issuecomment-60083138.
With your suggestion/answer I realize that I don't understand the scope and capability of the ferry orchestration language. Is there a pointer to some blueprints/documents/scribbles of the scope of the application stack YAML?
When we think about other stacks, such as the sharded NoSQL stacks, Hadoop, and Spark, elasticitity will become a desired feature. For example, Qubole abstracts the Hadoop blob away from the user and uses elasticity to deploy the 'right' amount of infrastructure. Such elasticity will likely have to use the APIs of the cloud provider, but it would be interesting if the ferry application stack YAML could capture this.
For MPI applications, size of the cluster will be a configuration parameter, possibly a command line argument. How all the compute, storage, and networks are orchestrated is where I draw a blank in the division of labor between ferry's YAML and orchestration platforms like AWS CloudFormation and OpenStack Heat/Ceilometer.
Better documentation surround the application YAML file is top on my todo list. The easiest way to think of Ferry with respect to CloudFormation and Heat, is that Ferry dynamically generates CF templates and uses CF to instantiate the physical infrastructure. That's an implementation detail, however. In theory you should be able to use Ferry without ever having to think about CF.
--James
On Wed, Oct 22, 2014 at 3:18 PM, Theodore Omtzigt notifications@github.com wrote:
With this answer I realize that I don't understand the scope and capability of the ferry orchestration language. Is there a pointer to some blueprints/documents/scribbles of the scope of the application stack YAML?
When we think about other stacks, such as the sharded NoSQL stacks, Hadoop, and Spark, elasticitity will become a desired feature. For example, Qubole abstracts the Hadoop blob away from the user and uses elasticity to deploy the 'right' amount of infrastructure. Such elasticity will likely have to use the APIs of the cloud provider, but it would be interesting if the ferry application stack YAML could capture this.
For MPI applications, size of the cluster will be a configuration parameter, possibly a command line argument. How all the compute, storage, and networks are orchestrated is where I draw a blank in the division of labor between ferry's YAML and orchestration platforms like AWS CloudFormation and OpenStack Heat/Ceilometer.
— Reply to this email directly or view it on GitHub https://github.com/opencore/ferry/issues/9#issuecomment-60139631.
There is a danger in too much overlap, or YAW (Yet Another Way), to define infrastructure. What attracts me to ferry as a concept is that it can encapsulate the best known methods for orchestrating big data and computational science infrastructure. This involves identifying how compute, network, and storage play together to create a productive, high performance, or cost effective Hadoop, Cassandra, MPI, etc. infrastructure on which we can deploy applications. Ferry can differentiate against CF and Heat in that department very well, as CF and Heat are 'generic' infrastructure description languages, but Ferry YAML would be computational science/data science 'optimized'.
The intercepts of compute, network, and storage are, IMHO the hard part of leveraging containers, so if all that 'knowledge' can be encapsulated by ferry, I would be very happy.
Let me know if I can help writing/completing that documentation.
Regarding MPI, I think that at least a parameter to specify a customized image (with libraries/application) is necessary. As already pointed out, also having the possiblity to use different MPI flavours would be great; this can be achieved maybe by using several base images related to different MPI implementations and then use the configuration file to specify the MPI flavour and the customized image (with libraries/application).
What about providing dockerfiles to ferry in order to build customized images?
You can already build and specify customized images to Ferry via the ferry build
command. However, it's only limited to connectors at the moment. My
inclination right now is to use the image
option in the application YAML
file.
--James
On Thu, Oct 23, 2014 at 6:43 AM, Marco Mancini notifications@github.com wrote:
What about providing dockerfiles to ferry in order to build customized images?
— Reply to this email directly or view it on GitHub https://github.com/opencore/ferry/issues/9#issuecomment-60221208.
Another nice to have feature would be to somehow add extension capabilities on the backend images in a way that a user could add some additional libraries over the already built base images. Essentiallty the ability to support customized images, either by replacing the base ones or by extending them