Open nvgoldin opened 7 years ago
Some comments:
We need to carefully consider and specify the hash format. Some considerations:
Bottom line - we should probably include the algorithm in the hash string. This also has implications for #51.
lago images pull
is useful or needed - the user does not need to handle images manually IMO - just list them in the LagoInitFile.lago image search
will need careful consideration and we should not try to implement it at this stage.I have a couple questions:
1.0-2.6-4.1-abcdef.fedcba
, that would mean that you used base recipe 1.0
, second recipe version 2.6
and third recipe version 4.1
(with hash abcdef
), and generated an image with hash fedcba
. That ensures:1.0-2.6-4.3
is newer than 0.2-2.6-4.3
.nfs-server~1.0-2.0-4.3
and get the latest that matches the three major versions.
The main issue with the above, is that it will not allow you to directly use gluster or virt-builder repos, though you could have some thin repo proxy to add the extra metadata (and map from a 'lago version' to a gluster image).
@ifireball
In abstract section 4 - "verify images" - should probably be "verify images data integrity" to be clear about what we mean by "verify". I think ImageProvider.get should be called ImageProvider.download to indicate it is going to do the expensive I/O. Calling it "get" sends the wrong signal IMO that it is something like dict.get. I think ImageProvider.search should not be mandatory at this point. And when we do add it we need to think carefully about the query format so we can leverage server-side capabilities. It is highly unlikely that servers will implement the exact Python or Grep (POSIX? PCRE? GNU?) regex dialect. ImageProvider.exists should probably just be ImageProvider.containes to allow usage of the "in" operator.
:+1: - I agree with most of the above. Probably should have written it explicity but I wrote the API section without the exact implementation details yet, it still needs some polishing. About the search - I also think this can be added later, though having an initial version won't hurt. As a first thought it looks problematic to try defining a unique search query for all providers as it might be very different from provider to provider. IMO, a plain search which compares if the image's name match at the beginning is sufficient(and useful!), later we can think about more complex queries if needed at all(filter by architecture, os, etc).
We need to specify what is in the image metadate. Leaving it unspecified makes it useless.
true - but I rather have this done in the PRs(and review), will need to get my "hands dirty" and test removing all metadata and check what is absolutely necessary(except the HASH). I already tested plain cloud images of fc24 and centos7 and they work without metadata at all.
We need to carefully consider and specify the hash format. Some considerations: virt-builder supports only SHA512 Glance supports only MD5 by default (but custom properties could be added) Re-hasing is very expensive I/O wise We should be efficient when someone only uses one kind of a provider (So not try to rehash everything with a specific algorithm)
I don't think there is any way to avoid rehashing if it is in different format. Thought virt-builder works with SHA512 and that is the first and main provider we are going to implement that, so I rather have that.
Bottom line - we should probably include the algorithm in the hash string. This also has implications for #51.
agree. @gbenhaim - what do you think? (affects #51)
In the "Local Image Caching" we need to specify the exact cache API like we did for ImageProvider.
:+1: - will do. needs more inspection.
I don't think lago images pull is useful or needed - the user does not need to handle images manually IMO - just list them in the LagoInitFile.
I think it is, I would love(as a user) to have the ability to pre-fetch images. Of course the automatic action would still be to pull the images from the init file, it doesn't contradict(and it is just a matter of exposing the internal download command as a CLI).
@david-caro
How would you generate the images on the server side? I'd love to be able to build them locally too (that will unify the code and simplify the image building process, and allow custom local build if needed, something similar to docker and dockerfiles). My original idea was that the image command was able to generate images from image recipes, locally (at some point, maybe even be able to upload somwhere the recipes/images, though that's a completely different service, like dockerhub or vargant repos).
One thought is to create something similar to createrepo
(maybe createvirtbuilder-index
?) that would auto-generate the index.asc file, it would extract all it can from the qcow2
file and calculate the hash if needed. Only thing it would need to somehow be 'provided' with is the following virt-builder fields: osinfo,arch,expand, with the 'expand' filed optional.
The content for an image changes each time you generate it, so the hashes will not be consistent between servers/rebuilds right?
Not sure why is that a problem? Lets say I'm a maintainer of an images repo, and I would like to add a new image, assume it is a new build of feodra24, I have two options: replace the current image, keeping it with the same name, and obviously it will have a new hash. If the user has the init file configured 'fedora24', he will get a rolling update. I can rename the old image to something else if I would like to keep it(and it would obviously maintain the old hash). on the other hand, if I want to explicitly differentiate my image, I would name it fedora24-something, and the user will have to ask for it explicitly. In this sense the image 'name' is just a tag in the repo for the hash, nothing more. There could be same tags pointing to different image hashes in different providers. It is up to the maintainer of the images repo to decide. On lago side, the verification eventually would always be done by the hash.
But still, this does not solve the ordering issue, as hashes are not consecutive, thus imo we still need some kind of versioning. My proposal is to use the version of the parent recipes + version of the current recipe for it. For example, you'd have the image with the version string 1.0-2.6-4.1-abcdef.fedcba, that would mean that you used base recipe 1.0, second recipe version 2.6 and third recipe version 4.1 (with hash abcdef), and generated an image with hash fedcba. That ensures: You can pin a version to an image file You know which versions of the recipes were used You know the amount of recipes needed You maintain versioning order, you will know that 1.0-2.6-4.3 is newer than 0.2-2.6-4.3. You can use kind-of semantic versioning, where if none of the major numbers changes, you'd expect the images to be backwards compatible, so when specifying an image requirement, you could say something like nfs-server~1.0-2.0-4.3 and get the latest that matches the three major versions. That will allow easy and nicer dependency declarations between them.
I'm not convinced it is absolutely necessary to have versions: I think for common usage case, the 'base' images shouldn't change often: most likely you will use virt-builder's official repo and use either 'fedora24'/'centos7'. Once you get into layers, this will be more of a specific use-case, so it is reasonable that the maintainer of the layer would ask the users either to explicitly use the 'tag' he created, such as: el7-with-jenkins-2.6-10102016
or tell them to explicitly use the hash.
About the recipes, similarly to how you do versioning in docker: it will be controlled by making versions of the LagoInitFile. This might mean we will need to add more 'virt-sysprep' options to the init file, on how to "chew" the image(such as disabling cloud-init or adding support for it). As I wrote in the previous comment, I don't think we are far from there - the basic cloud images of fedora and centos just work in lago with the current sysprep commands(aside booting is longer as it waits for cloud-init).
I also think that the metadata should not be in the image itself unless you can put any data into it, if that's the case then it's ok, but I don't think it's a good idea reusing any existing field giving it a new meaning.
:+1: I checked in qemu-img
and there is no "official" way to store more metadata in the 'qcow2' format in a way it would appear in qemu-img info
command. Only parameter we must use is the backing_file
parameter for the parent hash. Either way, it seems inevitable to store the metadata for each image(we can use virt-builder index.asc for that locally too, the problem is that it wouldn't allow efficient querying of the cache directly).
About the createrepo command, that is alreqdy done in the lago-images code. The extra information is in the recipes (with any/all the commands needed, including the info of which image to base it onto).
Aboit the changing hash, it might not be a big issue, as long as you have one and only one inage provider, including not building the images locally. Maybe it's something to be rechecked once is an issue, though it being kind of a central feature it will ne hard to change later.
About versions, as with the hashes, it's something that we don't need now (as we are the only ones using it, and hardcodding the versions/names on our scripts). But, and it's a big but, using a sensible versioning enables us to start really distributing the images, allows us to define if an upgrade is 'safe' or not, and any other benefit that you get with the 'tags' you mention (that, as I see it, is a pseudo-versioning).
About the recipes, I think you misunderstood me, I'm talking about the recipes to generate the images, not the LagoInitFile that uses them. For example, to generate the image fedora24_nfs, you need a fedora24 base image, and run some commands to install and configure the nfs server there, all that might not be on the LagoInitFile as it will considerably increase the time to get it the first time, but would preferably be done on the server once at a previous time so you just have to download the image. Those 'recipes' already must contain the info on what is the parent image, and well, I'd expect the versions of them to be the same of the image one (or to have some way to know the recipe used from the image).
That elegantly goes very well with versioning the images after the recipes (well, including the recipes version into the images one).
About the metadata in the image file, well, you might strictly need only the backing file hash (though you could put it anywhere else too, like the url, the name of the image...) But there is a lot more useful information that you are missing, and that by choosing to put it in the restricted metadata field you will not be able to add it easily later, without having a rewrite of the code that extracts/adds/uses that hash. For example, the default iser, the root partition, default password, os name, who built it, when, how, a small description...
For the current repos we use the json file that contains that metadata, I think that it's very easy to cache locally or by a proxy, and mirror (plain rsync is more than enough). Not needing a complex app to host a basic repo is really nice too, with metadata files you don't even need an app, you can use a read-only frontend and generate the images internally, quite safe (and fast to serve).
Hi, here is an initial proposition on how to enhance Lago's handling of images. As far as terminology we use today, I used here
images
on what we would calltemplates
, I think its more clear. This proposition does not depend on, but is complete with the general concept of layered images in https://github.com/lago-project/lago/issues/51. thanks @gbenhaim, @ifireball for the contributions.ImageProvider API
Abstract
The idea of this proposal is to allow Lago to use different remote servers in order to obtain images from(mainly in
qcow2
format), with the following goals in mind:virt-builder
index format andGlance
(additionally to the already-usedlago-images
format).lago-images
as a provider.lago-images
.Current Templates(images) mechanism in Lago today
TemplateStore - in charge of storing local images in the following format::
TemplateRepository
- manages lago-images repository and has preparation for extending it to different providers.HttpTemplateProvider
/FileSystemProvider
- providers for downloading the images.Suggested ImageProvider API
The ImageProvider API would be used to search, query images and to download them from a remote server into the local cache. New providers will be obligated to implement it.
API
get(hash, path)
- download the image, identified by the hash, to the specified local path.get_by_name(string)
- returns the first hash found for the given name, if it exists.exists(hash)
- returns true iff the image exists.get_metadata(hash)
- returns the image metadata stored at the server.search(name)
- this will return a list of(image-name, hash, metadata)
matching the name criteria, in 'grep' like behaviour.clear_cache
- invalidate local cache, if exists.Local index caching(i.e. mapping between names and hashes) for each ImageProvider will be optional. If implemented it will allow for some providers to optimize the seek time.
Decompressing and verifying the image file will be an implementation detail of each provider.
Handling images with the new ImageProvider API
Local images Caching
Seeking images
This will allow to have one-to-many mappings between the image names and hashes, so if a new image with the same name was pushed to the server, it will be downloaded if searched by name(semi-automatic updates behaviour).
Local cache invalidation
New
image
verbslago images pull <name/hash>, alias: lago pull
: download the given image to the local cache. If it has abacking_file
chain, ensure they exist in the cache or download them.lago images pull --no-recursive <name/hash>
: skip resolving thebacking_file
parameter if exists.lago images search, alias: lago search
: search image by hash or by name in the configured ImageProviders, also indicating if it exists locally already.Configuring ImageProviders
HASH vs Versioning
The images 'versioning' concept will be dropped, instead the name->HASH mapping will be used as described above. This has the advantage of uniquely identifying images, regardless of what provider they were downloaded from.
Main advantages
lago-images
, thus allowing users to configure Lago to use widely-used providers, or setup their own providers easily.Possible implementation stages
lago-images
as the first ImageProvider plugin. No major changes toTemplateStore
.virt-builder
ImageProvider plugin.lago init
would already start using the internal implementation).