quattor / aquilon

Configuration Management Database (3rd Generation)
www.quattor.org
Apache License 2.0
15 stars 20 forks source link

1.12.58: profile prolog template needed to define the LOADPATH #86

Closed jouvin closed 6 years ago

jouvin commented 6 years ago

Currently, hardware templates are executed first in the profile generated by the broker. Unfortunately when using the template library, the hardware templates needed are part of the template library (TL) and this requires to update the LOADPATH first so that the TL can be found. With the current version, the earliest possibility to do it is in archetype/base which is executed after the hardware templates.

My suggestion is to add a third template under archetype, called loadpath.pan (or prolog.pan but I think it is good to emphasize its main use with the name), and ensure that this template is called first (right after adding the archetype to the LOADPATH) if it exists (its existence should be optional if a site is not using the TL and doesn't need it).

A variant could be to execute archetype/base earlier but I can imagine that it may impact existing site and that there may be good reasons to have it after a few other actions happened.

ned21 commented 6 years ago

Interesting, I've not seen this before. @jrha - how do you manage this?

jrha commented 6 years ago

See #64. We implemented a preface template which does what @jouvin is suggesting.

jouvin commented 6 years ago

I looked at #64 (I was not able to find it this morning when opening this issue). The problem is that what MS implemented is not a preface template but an option to prevent inclusion of pan/units (done by default before including archetype/base). The comment in the code suggests that the result is that archetype/base is the first included template but in fact this is not the case: this is the second one, after the include of hostdata that executes hardware/ templates.

I played a little bit more with using the template library from Aquilon and I really think we need a preface/prolog template in addition to base. The are potentially things that we could want to do before configuring the OS but after configuring the hardware. Looking at the code, this seems easy to implement, in particular as the support for if_exists() seems already there. I'll look at what was proposed by RAL and may be reimplement it.

jrha commented 6 years ago

The missing link is that the hardware templates require the units from the template-library. I can't see a way to allow that without the preface template (and we are still using one).

jouvin commented 6 years ago

You could move archetype/base as really the first template but my point is that I think this is not a good idea and that it is better to have archetype/preface (or archetype/loadpath would have been my preferred name) executed as the first template, then keep include_pan=true and continue to include archetype/base after the configuration of the hardware.

jrha commented 6 years ago

Agreed.

jouvin commented 6 years ago

I'm working on porting your previous mod which is just fine with me! But host.py has changed since you made it, so there are a couple of minor things to fix.

urbonegi commented 6 years ago

Could you share the 'aq cat --hostname' output to demonstrate the template include order? The thing is that we would like to avoid defining new variables such as ARCHETYPE or new includes that would be added to every host profile. If we ensure the 'archetype/base' is included as the first template - you can add the content of suggested 'archetype/preface' to the archetype/base, right? Why to have 2 'base' templates that would be included on every host, cluster and metacluster: base and preface, and not one base template with content of both? Also the archetype should be defined in the template base rather than come from python code, for example "/system/archetype/name" = "somearchetype"; and then accessed in other templates.

jouvin commented 6 years ago

Here is an example (with #87 installed):

object template grid220.lal.in2p3.fr;

variable ARCHETYPE = "linux";
variable LOADPATH = list(
  ARCHETYPE
);

"/metadata/template/branch/name" = "site-init";
"/metadata/template/branch/type" = "sandbox";
"/metadata/template/branch/author" = "jouvin";

include { if_exists("archetype/loadpath") };

include { "pan/units" };
include { "pan/functions" };

"/" = create("hostdata/grid220.lal.in2p3.fr",
  "metadata", value("/metadata")
);
include { "archetype/base" };
include { "os/centos/7.x/config" };
include { "personality/test/config" };
include { "archetype/final" };

The important point is that, when using the template library (TL), you need to define the loadpath before loading the pan/... templates which are part of the TL. There is a chicken&egg pb. But for everything else, you probably need to have the pan/... templates already included and you don't want to duplicate in your site templates what the broker can do for you. So the only clean solution is the one proposed by RAL 2 years ago and implemented in #87: have a third archetype template included at the very beginning of the object template and (more or less) dedicated to the loadpath definition. With this, every thing in the generated object template is fine. The proposal and the implementation is to make this template optional by including it with if_exists(). This way the change has no impact on a site (like MS) not relying on the TL and not needing this mechanism. About the archetype variable proposed in #87, this is clearly not a strong requirement but I remain convinced that the use of variables is a better way of conveying temporary information than adding them to the profile configuration. It has the advantage that it can be done without any schema extension and that the schema can evolved if needed without impacting how the site defines things. This has been very successfully used in the TL (this is one of the key design point in the TL). The goal of this ARCHETYPE variable is not to put the information in the profile but to allow later in the configuration to do things like:

include format('/site/%s/example', ARCHETYPE);

The fact that it is a useful information to have in the profile is a different question. It is about whether or not the archetype information is useful in the profile (to use it in a dashboard for example). This may require to extend the schema for the quattor_profile. And it could be combined with the variable by adding to the profile:

"/system/archetype/name" = ARCHETYPE;
jouvin commented 6 years ago

@urbonegi BTW, you can look at the documentation PR in progress for an illustration on how we use the template library in Aquilon (Look particularly for the section at the end, "Configuring the base OS").

ned21 commented 6 years ago

Global variables slow down compilation and being mutable can lead to obscure bugs/behaviour. The schema already includes the archetype name so the existing code which adds the archetype by name to LOADPATH is more performant and lower risk than using an ARCHETYPE variable. To be clear we're not confusing ourselves this is what the start of our templates look like:

object template host.ms.com;

variable LOADPATH = list(
  "archetype"
);

which is functionally equivalent to the global variable version but faster and safer.

Returning to the need for a "pre-base" template. I see the reason you need this from this template. If the pan/ directory appeared at plenary/pan then it would work. However you are actually prefixing it with template-library and QUATTOR_RELEASE which is why you need an entire template.

In the short term I think there's a very easy workaround here to allow the TL to be used with the existing broker code. Content in template-king always takes precedence over what is in plenary so define pan/units.pan and pan/functions.pan to be an indirection back into the template-library/QUATTOR_RELEASE/pan/units and template-library/QUATTOR_RELEASE/pan/functions. This allows anyone to use a sandbox to test new versions of the TL although I admit it adds some complexity to the dependency tree when trying to work out where QUATTOR_RELEASE gets set because it will need to be in pan/units which is a non-obvious place to find it. You could also just put an indirect in plenary/pan/units. Again, not as obvious as having a template called "loadpath" or "prolog" but effective.

Longer term, I think we need to look again at the position of base relative to the hardware info. The previous attempt to make pan and functions optional via a config switch seems to have failed because hardware depends on them? We do very little in archetype/base.tpl because it turns out that most configuration depends on the OS or personality. Pretty much the only info you have at that point is hardware. So, perhaps the existing config option should be removed and replaced with one that allows base.pan to appear directly after the LOADPATH is set and before the hardware info is loaded?

jouvin commented 6 years ago

@ned21 probably we need a phone call asap to agree on a solution... My proposal is that we agree to disagree! I could explain you why I think the use of variables is better than depending on the schema with value(...). The last version of the PR I proposed allow to implement both ways, according to the site preference, based on the template_library_configured option... If this option is false, there is absolutely no change to the object profile (not even the loadpath.pan include if it exists). If true, it is implementing what I am proposing and REALLY WANT TO USE. I cannot imagine a better solution: let's implement both ways rather than trying to convince each others as we both have good reasons to do what we do. Your proposal to handle the template library as a special object in the broker is a wrong direction for me, if we can avoid it for the same features, leading to a much simpler/cleaner solution.

ned21 commented 6 years ago

@jouvin - which bit sorry? The use of mutable global variables are really bad practice, duplicating information already stored in /system/archetype/name and causes the compiler a performance hit. And in any case, you can always define ARCHETYPE in base.pan if you really want. That's one of the guiding principles of Aquilon, to be as simple as possible about the structure and delegate such decisions to the templates.

The use case for the additional include file before the hardware info makes sense, but complicates the broker and the include logic for the user. Making a config option that allows base above the hardware info is much simpler and probably a more sensible default configuration in my view.

jouvin commented 6 years ago

@ned21 No it doesn't work because part of the things that are done in base.pan in the the template library context support that you included pan/units and pan/functions that are included by the broker after... So it means duplicating what is done by the broker, really not optimal. I personally don't care about the performance hit of one variable less as the template library is built around variables and uses hundred of them. I understand that MS has a different scale and use case, this is why I proposed to have an option that says whether you are on the template library side or not. Both are legitimate but we build the object template with different requirements/constraints. I don't understand what is the problem you have with this approach and if we don't solve this issue, it means that Aquilon adoption is not possible for the template library users, i.e. all the community except MS. I think personally that Aquilon is so good that we need to make it possible to adopt it for existing users at the lowest possible effort/cost.

I really don't see the impact of the proposed change for the template library users only on the broker logic. But one thing that a @jrha's decision I fully support is that the template library must be in the plenary templates as this is something you are not supposed tweak/modify but use as it is. And contribute changes to it if you need new features or more flexibility. This model worked pretty well in the last 10 years in fact. And by the magic of include path and loadpath, you can develop your improved version of a TL template in a sandbox to test it before contributing it.

Have you looked at the last version of https://github.com/quattor/aquilon/pull/87. I'd appreciate feedback on what is a problem for MS with the approach adopted. For the record, the previous attempt to address the problem with the include_pan option was also a template library specific thing, without any use for MS. So if we accept that the generated object template need to have 2 variants, the MS one and the template library one, let's adopt the most suitable approach for template library users for the template library variant!

ned21 commented 6 years ago

I think the problem with forking the code paths, in addition to the added complexity of the code, is that you then use a set of code paths that are not well tested and used elsewhere, which leads to problems down the line. e.g. it's now become clear the pan/units and pan/functions change was not actually the right one because it doesn't work. We have pretty good unit test coverage and try hard to avoid breaking existing behaviour but sometimes code has to change and if we are not using that code path we cannot assess the impact on the community. Worse, sometimes we may bend over backwards to avoid changing something because think it's used else where but it's not and then we've wasted more effort.

I get that the TL does not follow the PAN performance guide's advice to avoid globals and am OK with that. People are free to implement as many globals as they want in archetype/base.pan. But rewriting the core plenary of a host just to obtain the same functionality via a global? That doesn't make sense and will lead to future problems because you are then running a non-standard config. And there's no need for it when it can be defined in base.pan, the broker is complex enough already!

Regarding the use of the TL in plenary. I also fully support this, it's a smart decision. But patching the broker to implement it was a mistake. You can implement the same purely in templates. It's not pretty to put a redirect in a file called pan/units but it works in a supported way. Think of it as using an extension in slightly unusual but valid way, versus binary patching the executable to achieve the same aim.

So then the question becomes, how do we fix this properly in a way that's acceptable to everyone? My proposal is to scrap the existing code for turning off pan/units & pan/functions and replace it with something similar that moves base.pan. That means base retains the same meaning everywhere regardless of how you configure Aquilon (it's always the first template that's loaded from archetype) which makes it supportable. It still creates a second code path that MS won't be using, but we probably could switch to it with minimal effort because it makes more sense and would allow us to adopt this same approach when using the TL. So then the question becomes why is this not acceptable to you?

No it doesn't work because part of the things that are done in base.pan in the the template library context support that you included pan/units and pan/functions that are included by the broker after... So it means duplicating what is done by the broker, really not optimal.

If I have read this correctly, you are saying you object to a plenary that looks like this:

object template foo.example.com;

variable LOADPATH = list(
    "linux"
);

include { "archetype/base" };
include { "pan/units" };
include { "pan/functions" };

"/" = create("hostdata/grid220.lal.in2p3.fr",
  "metadata", value("/metadata")
);
include { "os/centos/7.x/config" };
include { "personality/test/config" };
include { "archetype/final" };

because units and functions are included twice and that's not optimal? My preference is for supportability of the broker (which has high complexity) and shared understanding (the first template is always called base) over a minor duplication of includes, which the compiler will ignore anyway provided they are unique. We could even make the option drop those two includes if you want the base template to always take responsibility for including them so then there would be no duplication. So really this is just an extension of what we already proposed and implemented, and the reasons which we preferred that approach remain.

To play devil's advocate, the moving of base higher up the stack would break anyone doing hardware driven config in it today and makes it harder to use for anything except setting up the call to the standard library. But when I looked how we use it today, that's all it does so this is a logical move that we could adopt inside MS.

jouvin commented 6 years ago

@ned21 I think this discussion is becoming crazy... we need to find another way to make progress and for me this cannot wait the next workshop... You speak about "forking" but the current reality is that Aquilon has been forked to be usable by the TL users. Once by RAL 2 years ago with a set of required patches never merged (including something similar to this one) and mine with the PR we are discussing. I don't thing that 2 variants can be called a fork if they are controlled in the same way, tested the same way... There are many other variants in Aquilon, I don't see why this one is more unacceptable.

I buy your point about testing! I had not time to look at how Aquilon testing works but I'd be happy if you can give me any hints on how the current includes are tested, so that I can a test for the one thing added. And I also realize that there is a flaw in my proposed PR: variable ARCHETYPE should be unmutable, i.e. final.

I really cannot understand why you want to move base upper at the risk of a problem with hardware templates that we have not foreseen (this is the kind of thing were corner case are difficult to identify) instead of not changing the current logic and just adding a small bit of flexibility (that must be controlled as much as we can, at least by a good documentation).

ned21 commented 6 years ago

I really cannot understand why you want to move base upper at the risk of a problem with hardware templates that we have not foreseen (this is the kind of thing were corner case are difficult to identify) instead of not changing the current logic and just adding a small bit of flexibility (that must be controlled as much as we can, at least by a good documentation).

Because it solves multiple problems. The base can set any necessary global variables (such as ARCHETYPE) which means we don't need the other code change, and it means we can remove the hardcoded dependency on files from TL from the broker-generated plenary. @nuked suggested this a year ago because it simplifies the broker code quite nicely, and something we could adopt internally. That would mean everyone would be using Aquilon in the same way which is surely a huge for supportability?

You speak about "forking" but the current reality is that Aquilon has been forked to be usable by the TL users

Yes, and moving base allows us to converge on a single code base. Long term I would like us to adopt this same pattern of consuming the TL un-modified and I think the way we would do it is by moving base. So if this PR gets merged, we'll remain on two code paths. The last time I submitted a PR to Aquilon it took 45 mins to run the unit tests. Anything we can do to reduce code paths and the number of tests required is a good thing for making development easier!

Unfortunately I've realised it won't work. :( The problem is the way hostdata is loaded:

"/" = create("hostdata/grid220.lal.in2p3.fr",
  "metadata", value("/metadata")
);

means that any manipulation of the profile done before this would be lost as the create() replaces /. That's a risk of including any template before this line, regardless of whether it's called loadpath or base: if someone puts a DML assignment block in it, that will be lost. That's pretty confusing and not very user-friendly. It can be prevented by forcing the included file to be a declaration template but I don't know of a way to specifying that from the calling file. i.e. if you can setup archetype/preface.pan to be a declaration template and put a comment in saying "don't change this!" but it would be nice if we could force the compiler to check that's a declaration template and error otherwise as this is something that needs to be created for every new archetype. A quick read of the PAN book doesn't provide any insight into how to do that.

I will think about this some more.

ned21 commented 6 years ago

Having played with the example template provided by RAL in #64, I now agree with you that there is no way to achieve this without adding a further template to the generated plenary. And internally we should adopt this too.

The name for this template in originally suggested in #64 is a little confusing because it suggests that this is just another template in the sequence, and we needed a name for something that comes before "base". So "loadpath.pan" is a better name but still not quite accurate because you can do other things in this template: the key is that it can only be a declaration template. Inspired by @jouvin's configuration documentation PR, I realised that we need a skeleton set of archetype files that people can clone and modify to get started. That allows us to document surprising behaviour like the first template being declaration only, and also ensure that things like quattor/profile_base get included os you get a schema. (At one point we had a number of archetypes that missed that step and so were not using Quattor properly!)

I had a look for a good place to put this skeleton and couldn't really see one except template-library-examples, so I created a branch: https://github.com/ned21/template-library-examples/commit/f38982c1e7b4288623fa729c7094e6a460ec449c. I have used the name "archetype/declarations" because that most closely matches what's actually happening.

This could also replace some of the "code generation" being done in quattor.github.com#244.

So if this PR was updated to generate this object template as an option

object template fgrid220.lal.in2p3.fr;

variable LOADPATH = list(
    "linux"
);

include { "archetype/declarations" };

"/" = create("hostdata/grid220.lal.in2p3.fr",
  "metadata", value("/metadata")
);
include { "archetype/base" };
include { "os/centos/7.x/config" };
include { "personality/test/config" };
include { "archetype/final" };

I think we could all converge to this over time because it would be easy for us to add a declarations template to each of our archetypes, then update our config and flush our plenaries to start using the new structure.

jouvin commented 6 years ago

@ned21 first, let me say that I appreciate all the efforts you made to go into every details... and I'm happy that you shared our (@jrha and me) conclusion. And you spotted probably the details I missed making the base.pan solution you proposed not working (the fact that host data will overwrite /).

I completely agree with you that this loadpath.pan template should be dedicated to this early loadpath declaration and that anything we can do to avoid that it is used for something else is a good thing. You suggestion to document it as a declaration template is a very good one and I'll update my document PR on this base. Doing more may require panc changes as you said and I am not completely sure that it is worth the effort (in particular preventing the inclusion of anything but a declaration template until some point in the object template). The template name may also help and it is why I didn't like preface.paneither and decided to use loadpath.pan. If you think of a better name, I have no problem (my English is too limited)!

About your "forked object template" remark, despite I insist that it is more a minor variant than a fork, in principle I agree that if we could have an object template suiting everybody without any variant it would be better. If you agree that the proposed include if_exists('loadpath.pan); is acceptable for MS, I'd suggest that we do it inconditionally, not base on the template_library_configured option. Because of the if_exists() it is already somewhat optional so the added option was just there to make the change more acceptable for MS but IMO it is not needed. Whatever we do, I agree that we should keep as limited as possible the variants in the object template and that we should not just say that if somebody wants an option for its particular use case we just add it and forget about it if we don't use it. I'd say from my limited Aquilon experience so far, but reaching the point where I configured one server with Aquilon and the template library, that I don't see any other problem in the object template as it is today.

The LOADPATH variable is another story and not strictly a requirement for the template library support at this point. But what you proposed, have it defined in the first/early included template doesn't work as this template is not generated by the broker. The whole point of this variable is to pass to the template environment information declared in the Aquilon DB (this is why it should be final as it would not make sense to modify it): only the broker can generate it. I had no use for it so far but I can easily imagine situations where you want to customize a feature configuration or the OS configuration in the context of a particular archetype, something like:

include format('site/%s/feature_config', ARCHETYPE));

Again this is not about adding many variable to the object template. Just this one. As far as I know the archetype declared in the Aquilon DB is not passed to the template, except via the initial LOADPATH declaration but this variable is a list and it may be difficult to identify which one is the archetype. Thus my proposal to have the broker defining ARCHETYPE in the generated template and adding the ARCHETYPE variable to LOADPATH to avoid any risk of divergence. Also, there is currently an archetype resource in the profile but the archetype name is not part of what is pushed to the profile AFAIK... I have no opinion at this stage if it would be desirable or not but if it is desirable we could use also the ARCHETYPE variable to ensure consistency.

To move forward, I'd propose to split #87 in 2 PRs

jouvin commented 6 years ago

BTW, I realize that I forgot one thing related to providing skeleton templates for the users. I'll look at your PR in template-library-examples, this is may be a good place to put things. The alternative is to use the configuration documentation: this is what I did yesterday and give may be a better exposure (I'm afraid template-library-examples is not really known to users). My proposal for the future would be that aq add_archetype command creates skeleton templates somewhere. The plenary templates are not necessarily the ideal place but as long as the plenary templates are last in the include path (allowing them to be overloaded by the sandbox/domain templates), it may be not so bad and we could document that they should be copied before being customized.

ned21 commented 6 years ago

Thanks for the detailed response. In reverse order:

aq add_archetype command creates skeleton templates somewhere.

The aq command cannot modify template-king so that's difficult. I am OK with the skeleton being part of the doc but since it needs to be followed every time you add a new archetype, it would be nice if this was easy to find. So let's move that to the discussion in my branch of template-library-examples.

To move forward, I'd propose to split #87 in 2 PRs

Yes, I think that's a great idea.

One adding the support for loadpath.pan.

I like the name "archetype/declarations" as it spells out the intent. Also please make it a config option rather than using if_exists() because we want to use it to remove the current includes of pan/units and pan/functions. Then when everyone has migrated, we can just remove the old code -- deleting code is good for maintenance! But that's my personal opinion: we should check what @urbonegi and @gombasg think. (Unlikely to be reading this until Tuesday due to the UK holiday.) And there may be some discussion to be had over the name of the config option. I suspect mentioning template_library will be seen as too specific and it will need to be something like include_archetype_declarations, or using the existing one around pan_units since that's already there and set.

Another PR related to the definition of the ARCHETYPE variable.

It might be better to open that as an issue for discussion before committing to writing code because it's definitely controversial and as you get to know Aquilon better you might realise it's unnecessary.

I had no use for it so far but I can easily imagine situations where you want to customize a feature configuration or the OS configuration in the context of a particular archetype, something like: include format('site/%s/feature_config', ARCHETYPE));

Features and OSes are per-archetype so your per-archetype config lives in $archname/features/$feature-name/...

People find it hard enough to understand how a profile is built so pulling config from out of tree parameterised by ARCHETYPE would make life very difficult, and it's not recommended practice.

I am back in the office on Tuesday so will catch up with the Aquilon team then and try to join Thursdays standup to sort out the last few details on name of file, name of config option, and keep this moving forward.

Enjoy the rest of the weekend!

jouvin commented 6 years ago

@ned21 ok, I'll split the pull request, I should have done it since the beginning!

As for the ARCHETYPE discussion, I agree that I need to play more with Aquilon + template library to understand the real need and it can certainly wait as there is no concrete use case at the moment. But from what I know, I tend to not completely agree (😄) about features being archetype-specific... When you declare a feature, it is not associated with an archetype, this is the personality which is. And the template library provides a lot of feature templates (in particular for EGI grid sites and OpenStack) and they are not associated with an archetype. I'd say this is one of the great feature of features!! This is why I envisioned that the kind of include I mentioned in https://github.com/quattor/aquilon/issues/86#issuecomment-392332718 may be useful... But I certainly agree that we should not promote this approach if not useful as it will have to the complexity of debugging includes (as a pan template developer, I know only the debug messages in templates to help)...