mikelangelo-project / capstan

Capstan, a tool for packaging and running your application on OSv.
http://osv.io/capstan/
Other
19 stars 7 forks source link

Enhance package compose to upload files to a remote OSv instance #70

Closed wkozaczuk closed 7 years ago

wkozaczuk commented 7 years ago

This is more of a sketch of a proposal to demonstrate how this can be solved. I do not expect this to merged as is.

The motivation comes from the fact the creating AWS AMIs from OSv raw image is pretty time consuming task and is proportional to the size of the image. The idea is to have 'mike/osv-loader' EC2 instance running with cmdline '--norandom --nomount --noinit /tools/mkfs.so; /tools/cpiod.so --prefix /zfs/zfs; /zfs.so set compression=off osv' and use capstan with option --remote-guest-name to upload files that make up an image and set actual cmdline. To make it possible cpiod needs to slightly modified to allow setting cmdline.

I have tested this approach and it works and is faster than import raw image and EC2 snapshot/AMI.

wkozaczuk commented 7 years ago

All your code comments make sense so I will update accordingly.

However I have couple of concerns with your idea of combining with cloud-init.

Firstly the base image with combined cloud-init would implicitly contain httpserver-api and osv.bootstrap. This might not be desired in all cases (I am guessing that cloud-init would makes sense in most AWS deployments though) and kind od inconsistent with 'mike-loader' base image in terms what it represents and does.

Secondly and more importantly I think many would prefer AMIs to be immutable artifacts that can be used to instantiate corresponding EC2 instances with predefined bootcmd rather than rely on cloud-init passing right info. I would rather cloud-init be a mechanism to supply application configuration than set bootcmd. I might not be understanding how cloud init works with OSv.

Maybe my concerns are subjective and do not have much merit. I also realize that your approach does not require any OSv changes.

On Mon, Oct 23, 2017 at 2:48 AM, Miha Pleško notifications@github.com wrote:

@miha-plesko commented on this pull request.

@wkozaczuk https://github.com/wkozaczuk we are aware of the pain that one faces when uploading OSv unikernel to the AWS so we're thankful for your efforts towards resolving this. I really like your approach!

May I suggest that we combine your suggestion with the osv.cloud-init package for setting bootcmd? Basically we would require from user to prepare base image (the one that Capstan then connects to remotely) with osv.cloud-init package included and with bootcmd set to /bin/sleep.so -1. On the AWS user would then start it with following cloud-init data:

run:

  • PUT: /app/ command: "/tools/cpiod.so --prefix /zfs/zfs; /zfs.so set compression=off osv"

The Capstan would then connect to this instance and upload the files, leaving the bootcommand intact i.e. /bin/sleep.so -1. When user then wants to actually run the new unikenrel, she would simply change the cloud-init file to match her needs:

run:

  • PUT: /app/ command: {ARBITRARY-BOOTCMD}

What do you think @wkozaczuk https://github.com/wkozaczuk?

In capstan.go https://github.com/mikelangelo-project/capstan/pull/70#discussion_r146173849 :

@@ -423,6 +423,7 @@ func main() { cli.BoolFlag{Name: "pull-missing, p", Usage: "attempt to pull packages missing from a local repository"}, cli.StringFlag{Name: "boot", Usage: "specify default config_set name to boot unikernel with"}, cli.StringSliceFlag{Name: "env", Value: new(cli.StringSlice), Usage: "specify value of environment variable e.g. PORT=8000 (repeatable)"},

  • cli.StringFlag{Name:"remote-guest-name",Usage:"specify hostname of remote guest"},

Perhaps using short name remote would do better? I think users will use IP more often than the hostname. Something like this:

$ capstan package compose --remote 189.16.0.5


In capstan.go https://github.com/mikelangelo-project/capstan/pull/70#discussion_r146173962 :

@@ -423,6 +423,7 @@ func main() { cli.BoolFlag{Name: "pull-missing, p", Usage: "attempt to pull packages missing from a local repository"}, cli.StringFlag{Name: "boot", Usage: "specify default config_set name to boot unikernel with"}, cli.StringSliceFlag{Name: "env", Value: new(cli.StringSlice), Usage: "specify value of environment variable e.g. PORT=8000 (repeatable)"},

  • cli.StringFlag{Name:"remote-guest-name",Usage:"specify hostname of remote guest"}, }, Action: func(c *cli.Context) error { if len(c.Args()) != 1 {

This check shoul be updated so that when --remote is given, the argument image-name is no longer needed.

In cmd/compose.go https://github.com/mikelangelo-project/capstan/pull/70#discussion_r146174569 :

@@ -188,6 +188,57 @@ func UploadPackageContents(r *util.Repo, appImage string, uploadPaths map[string return newHashes, cmd.Wait() }

+func UploadPackageContentsToRemoteGuest(uploadPaths map[string]string, osvCmdline string, remoteGuestName string, verbose bool) error {

This function is more or less a copy-paste from the original UploadPackageContents function only it doesn't start the instance beforehead. May I suggest to extract common logic into a function, say:

func uploadFiles()

that both functions, UploadPackageContents and UploadPackageContentsToRemoteGuest, would use? The former would upload to localhost, the latter to the remote.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mikelangelo-project/capstan/pull/70#pullrequestreview-71077314, or mute the thread https://github.com/notifications/unsubscribe-auth/AFDSIZ0UP07_XCYypwEjuhhnRNOFVAtgks5svDbWgaJpZM4QCCGZ .

miha-plesko commented 7 years ago

@wkozaczuk I agree to most of your arguments, let me explain why I came to the idea of using cloud-init and then we can decide which approach to take. Most probably a combination of both of them is the way to go.

Workflow as you suggest

  1. user composes empty unikernel locally with Capstan and sets bootcmd to ... /cpiod.so ...
  2. user uploads the empty unikernel to the AWS as AMI
  3. // following steps are repeatable for same AMI
  4. user boots EC2 instance out of the AMI via AWS Dashboard
  5. user runs Capstan locally and uploads arbitrary files to the EC2 instance and sets arbitrary bootcmd
  6. user reboots EC2 instance - then her application is running

This sounds a great idea and I admire you for inventing it 👍. My only concern is that it's a "one-time offer" meaning that Step 4 can be triggered only once for a given EC2 instance. When we change bootcmd (Step 5) we can no longer repeat the procedure to upload additional files to the EC2 instance, nor update its bootcmd. Why would we want the procedure to be repeatable? Firstly, to make it more robust - Capstan would no longer depend on baked-in bootcmd that was set long time ago, when Step 1 was done. Secondly, the maintenance! We would suddenly be able to stop existing unikernel, upload some additional files to it, and start it back.

Workflow as I suggest

  1. user composes empty unikernel locally with Capstan and sets bootcmd to ... sleep ...
  2. user uploads the empty unikernel to the AWS as AMI
  3. // following steps are repeatable for same AMI and for the same EC2 instance
  4. user boots EC2 instance out of the AMI via AWS Dashboard (cpiod bootcmd is forced via cloud-init)
  5. user runs Capstan locally and uploads arbitrary files to the EC2 instance
  6. user reboots EC2 instance with updated cloud-init - then her application is running

This way Step 4-6 are repeatable for given EC2 instance since we can decide to boot the cpiod at anytime we want :) Yes, as you've pointed out correctly, there is a ~4MB overhead when cloud-init package is employed (0.7MB for osv.cloud-init + 3.3MB for dependant osv.httpserver-api), but I don't think this is really too much.


One way or another, Capstan doesn't even need to know if the EC2 instance was booted directly or via cloud-init so we can merge this PR when you update it and when Travis is green.

I'd also kindly ask @gberginc to comment on this discussion as he is experienced both with Capstan and OSv.

wkozaczuk commented 7 years ago

@miha-plesko I think for now I can limit my patch to simply allow uploading of files to a remote OSv instance which anyway most of this patch is about. In essence apart of the code modifications you suggested I will drop this line:

// Send command line. cpio.WritePadded(conn, cpio.ToWireFormat("!!!SETCMDLINE:" + osvCmdline, 0, 0 ))

So the mechanism of setting command line will be handled outside of capstan through different mechanism.

Also please see my other comments interleaved below.

On Wed, Oct 25, 2017 at 3:23 AM, Miha Pleško notifications@github.com wrote:

@wkozaczuk https://github.com/wkozaczuk I agree to most of your arguments, let me explain why I came to the idea of using cloud-init and then we can decide which approach to take. Most probably a combination of both of them is the way to go. Workflow as you suggest

  1. user composes empty unikernel locally with Capstan and sets bootcmd to ... /cpiod.so ...
  2. user uploads the empty unikernel to the AWS as AMI
  3. // following steps are repeatable for same AMI
  4. user boots EC2 instance out of the AMI via AWS Dashboard
  5. user runs Capstan locally and uploads arbitrary files to the EC2 instance and sets arbitrary bootcmd
  6. user reboots EC2 instance - then her application is running

This sounds a great idea and I admire you for inventing it 👍. My only concern is that it's a "one-time offer" meaning that Step 4 can be triggered only once for a given EC2 instance. When we change bootcmd (Step 5) we can no longer repeat the procedure to upload additional files to the EC2 instance, nor update its bootcmd. Why would we want the procedure to be repeatable? Firstly, to make it more robust - Capstan would no longer depend on baked-in bootcmd that was set long time ago, when Step 1 was done. Secondly, the maintenance! We would suddenly be able to stop existing unikernel, upload some additional files to it, and start it back.

My workflow is slightly different from what you describing above because my ultimate goal is to create a final immutable application AMI which can be used to create any number of EC2 instances (obviously configuration would be supplied by cloud init).

So my workflow would be this:

  1. user composes empty unikernel locally with Capstan and sets bootcmd to ... /cpiod.so ...
  2. user uploads the empty unikernel to the AWS as AMI
  3. to create final application AMI user launches "build" EC2 instance using AMI created in step 2
  4. user runs Capstan locally and uploads arbitrary files to the EC2 instance and sets arbitrary bootcmd
  5. user somehow sets application cmdline and take snapshot of the EC2 instance to create "final" AMI
  6. at some point later (repeatable step) user launches EC2 using AMI created in step 5 and get configuration set through cloud init

Please note that ideally instead of creating "final" AMI users should be able to create "intermediate" AMIs like say AMI with JVM or node.js prebaked in. Then it would use additional steps to upload application itself (Java jars or *.js files).

Workflow as I suggest

  1. user composes empty unikernel locally with Capstan and sets bootcmd to ... sleep ...
  2. user uploads the empty unikernel to the AWS as AMI
  3. // following steps are repeatable for same AMI and for the same EC2 instance
  4. user boots EC2 instance out of the AMI via AWS Dashboard (cpiod bootcmd is forced via cloud-init)
  5. user runs Capstan locally and uploads arbitrary files to the EC2 instance
  6. user reboots EC2 instance with updated cloud-init - then her application is running

This way Step 4-6 are repeatable for given EC2 instance since we can decide to boot the cpiod at anytime we want :) Yes, as you've pointed out correctly, there is a ~4MB overhead when cloud-init package is employed (0.7MB for osv.cloud-init + 3.3MB for dependant osv.httpserver-api), but I don't think this is really too much.

One way or another, Capstan doesn't even need to know if the EC2 instance was booted directly or via cloud-init so we can merge this PR when you update it and when Travis is green.

I'd also kindly ask @gberginc https://github.com/gberginc to comment on this discussion as he is experienced both with Capstan and OSv.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mikelangelo-project/capstan/pull/70#issuecomment-339239048, or mute the thread https://github.com/notifications/unsubscribe-auth/AFDSIbl-NhcIZtBeSlznl7KgIdxZnygtks5svuIHgaJpZM4QCCGZ .

wkozaczuk commented 7 years ago

@miha-plesko I have addressed all your review comments. I also refactored the code to expose new package compose-remote command rather than tuck on new parameter to the original compose command. It seems cleaner.

Do you want to me create new pull request and squash the commits? I am also not sure the signature.

miha-plesko commented 7 years ago

@wkozaczuk just in case you're not familiar with GitHub: you needn't open another PR when you squash the three commits into a single one. You only need to force push the squashed version to your compose_remote branch and this PR will be synchronized.

miha-plesko commented 7 years ago

@wkozaczuk now that we're having a standalone capstan package compose-remote command we could also support the --run argument that would set the bootcmd like you've proposed before. What do you think?

Looking at the https://groups.google.com/forum/#!topic/osv-dev/ha_ouYegYAc I don't think we need to modify the OSv for that. But let me also answer you there.

wkozaczuk commented 7 years ago

Let me add unit tests and then I will squash commit my changes and push.

miha-plesko commented 7 years ago

@wkozaczuk would you like me to merge this PR now and you can provide the unit tests later? I do realize that it's not trivial to implement them so it can wait. Please just squash the commits and we merge.

wkozaczuk commented 7 years ago

I have just squashed and pushed my changes.

Also I think I have a working unit test around this change which I will send as a new pull request later.

I think it would be also nice to add something to the documentation (*,.md) files but I am not sure which one to modify.

miha-plesko commented 7 years ago

Thanks for your contribution! I suggest to add documentation into a new file Documentation/ComposeRemote.md and link it from the main README.

wkozaczuk commented 7 years ago

Sure I will add it to the unit test pull request which I hope to have ready soon.

On Wed, Nov 8, 2017 at 2:28 AM, Miha Pleško notifications@github.com wrote:

Thanks for your contribution! I suggest to add documentation into a new file Documentation/ComposeRemote.md and link it from the main README.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mikelangelo-project/capstan/pull/70#issuecomment-342733321, or mute the thread https://github.com/notifications/unsubscribe-auth/AFDSIYCERUHTODK8nMZyC4ckGr54Lqlsks5s0VgVgaJpZM4QCCGZ .