RFC: Remove config/default.tf and default parameters

richiejp commented 5 years ago

So I have just started writing a custom TF file and most of the parameters QA Terraform expects simply make no sense for my scenario.

I think we are creating a problem for ourselves by giving QA Terraform the responsibility of defining standard parameters for Terraform files. It would be better to just leave generating the Terraform files to the test writer for now and just pass through --tvar arguments to terraform and pass out terraform output to the test writer.

If a very common pattern develops then we can create some special function for that. Otherwise I think we will just end up over-engineering everything.

@jlausuch @asmorodskyi @cfconrad @czerw

cfconrad commented 5 years ago

If I'm not wrong, we pass all variables to terraform specified with --tfvar (https://github.com/jlausuch/qatrfm-lib/commit/51917be457c87fb083fc404af521e4c3e1a862b4). The output is predefined for now, as we need the IP and instance names to make further calls like "exec" or "ssh" on the VM's. Maybe it is possible to grep that also from the "general" terraform log. But till now, we do it with this output helpers.

jlausuch commented 5 years ago

So I have just started writing a custom TF file and most of the parameters QA Terraform expects simply make no sense for my scenario.

What parameters doesn't make sense?

richiejp commented 5 years ago

if I'm not wrong, we pass all variables to terraform specified with --tfvar

That is good I suppose. @asmorodskyi points out I can remove the input vars, they are not required. The problem is the wording in the README.

The output is predefined for now, as we need the IP and instance names

Not in my case, this is either static or not controlled by libvirt. Do you really need to set so much stuff dynamically in your scenarios or could you just create separate .tf files? We shouldn't use stuff like vars until we really need them. It is easier to read/debug a static config file for the exact scenario one is testing than one which is full of variable substitutions for things you don't need.

I recently burned myself with using var substitution when it wasn't needed, so the pain is fresh in my mind :-).

What parameters doesn't make sense?

None of the parameters make sense except 'image' and 'basename'. However I don't want to get stuck discussing my specific scenario. The point is, the more stuff you include in the library that isn't absolutely necessary for most scenarios. The more difficult it is to make it work with some unusual scenario.

My suggestion would be to move default.tf to the examples folder. It is definitely useful, but I can imagine default.tf will get very complex and frighten people if it is allowed to evolve.

Also shouldn't we be encouraging people to define their infra declaratively with .tf files? This seems to be the main value in using Terraform for me.

BTW none of this is blocking me and I probably don't have the full picture yet, I am just noting this down while it is fresh in my mind.

jlausuch commented 5 years ago

None of the parameters make sense except 'image' and 'basename'. However I don't want to get stuck discussing my specific scenario. The point is, the more stuff you include in the library that isn't absolutely necessary for most scenarios. The more difficult it is to make it work with some unusual scenario.

Ok, I can agree we can remove num_domains, ram and cores from the default .tf file. But net_octet is the only solution we have if we want to run multiple tests at the same time to have network isolation between tests.

My suggestion would be to move default.tf to the examples folder. It is definitely useful, but I can imagine default.tf will get very complex and frighten people if it is allowed to evolve.

I wouldn't like evolving default.tf more than it is now. This is just a way to provide a basic environment. Allowing a user to have an environment without understanding Terraform internals (e.g. how to create a tf file).

cfconrad commented 5 years ago

Not in my case, this is either static or not controlled by libvirt. Do you really need to set so much stuff dynamically in your scenarios or could you just create separate .tf files? We shouldn't use stuff like vars until we really need them. It is easier to read/debug a static config file for the exact scenario one is testing than one which is full of variable substitutions for things you don't need.

I hope I understood you correct. I just create a custom.tf with a testscript. https://github.com/cfconrad/qatrfm-lib/commit/12da6dd0a79af409d57f98e89ad5af1d4825ed82

You can run this test with:

qatrfm --tfvar image=images/sle-15-SP1-x86_64-186.1-terraform@64bit.qcow2  -t examples/minimal/

Of course you could specify the image also with absolute path inside of the tf file. Then you have nothing left beside the output's.

output "domain_ips" {
    value = "${libvirt_domain.domain-sle.*.network_interface.0.addresses}"
}
output "domain_names" {
    value = "${libvirt_domain.domain-sle.*.name}"
}

Regarding static IP's: I think we didn't thought about that. And I don't know if there is a possibility to set the IP directly in the tf file or gain it in some different way. But theoretical it shouldn't matter that much (for linux), as we are executing commands using the virtio-guest interface. On windows, I don't know how it will work there and if there is a guest-client in general.

cfconrad commented 5 years ago

Maybe the output thingy could be avoided as well. There is this command terraform show which prints out the IP's and vm names. So we could use this and read needed info from there. @jlausuch wdyt?

jlausuch commented 5 years ago

It's a better approach, so we don't force to use terraform output in the .tf files. This is an example of terraform show: https://pastebin.com/raw/F4w7j0GP

We would then need to do careful "greps" to catch the information, but this is very sensitive. See what this command returns:

$ terraform show|grep name
  name = terraform-vm-40a02476
  network_interface.0.hostname = 
  network_interface.0.network_name = terraform-net-40a02476
  name = terraform-vm-2c71fa17
  network_interface.0.hostname = 
  network_interface.0.network_name = terraform-net-40a02476
  name = terraform-net-40a02476
  name = terraform-vdisk-40a02476.qcow2
  name = terraform-vdisk-2c71fa17.qcow2

jlausuch commented 5 years ago

So, basically we would need a parser of terraform output instead of doing greps which might fail... if you agree with this, I can help implementing it.

cfconrad commented 5 years ago

Yes I think we should give it a try.

I think we should check some terraform show outputs, especially with vm's of different names and count > 1. We can also use these outputs for pytests.

I'm quite busy, so go ahead :)

richiejp commented 5 years ago

@jlausuch I think you can have two isolated networks with the same IP range. It might make routing a problem depending on the topology connecting the networks, but I think there are ways around it. However in my case DHCP will probably be handled by a windows AD controller anyway.

This is just a way to provide a basic environment. Allowing a user to have an environment without understanding Terraform internals (e.g. how to create a tf file).

I would aim to make it easy for the user to get started (e.g. provide examples for common scenarios), but not to allow them to use the tool without beginning to learn Terraform.

@cfconrad I guess the library can just return a (maybe empty) dictionary of output values to the tester. Depending on what the .tf file has defined. The general problem I see is that you are trying to do too much for the user at the base level.

The Domain class expects that it will be able to get the IP address from Terraform/libvirt. The wait_for_* functions expect certain variables to be set in the class instance. Otherwise we can't use them. If they were static and simply took an IP address or domain name, then they would be decoupled from the .tf file and the rest of the deployment process.

The deploy function also looks like it will be a problem for some users because it is doing too much. It might make sense to have a convenience function like deploy (for the common case), but at the base level of the API we probably just want to call terrafrom init and apply then leave it to the user to decide whether to call wait and how they will provide the IP addresses.

Functions in the core API should do as little as possible and have a single obvious effect. You can then have convenience functions which help with the common case and build on the core API.

There is this command terraform show which prints out the IP's and vm names

get_domains could simply return the parsed JSON or maybe you could have get_domains_raw for that and get_domains returns your object model (this is kind of what I do in JDP and it is nice to have an object model, but also you sometimes need the 'raw' output as a dictionary).

The main reason I am harassing you with this is because we have repeatedly been burned with these problems in the os-autoinst testapi (e.g. having to generate raw undocumented JSON for the external results tab because the API function both uploads the results from the SUT as well as parse/submit them) and I have also made the same mistakes in JDP. We can refactor later on, but it gets exponentially more expensive as time goes on.

cfconrad commented 5 years ago

@richiejp that sounds reasonable to me +1

Functions in the core API should do as little as possible and have a single obvious effect. You can then have convenience functions which help with the common case and build on the core API.

To this, I would like explore the possibilities with oop in python. As an example, I'm currently working with django. And in the first place, it does everything for you. But if you are going to make more and more customization, it seems to be always possible by Mixins and overrides.

richiejp commented 5 years ago

Mixins look like composition which is my preferred style (regardless of OOP or not), so that is fine by me. The only issue I have with them, I have with Python OOP in general, which is there is no separation between a data 'struct' (like C and Julia) and a class/module containing methods/functions. However Perl OOP generally has the same issue, but it wasn't such an issue for the OpenQA QEMU backend. The trick is to try separating your mostly data classes from your classes containing complex logic.

Overriding is good in the case where you have an 'interface' or 'protocol' with a default implementation that does very little and is made to be overridden. When you have a base class (or worse, a chain of super classes) with an implementation which does a lot (and has lots of data properties), then overriding is probably not practical.

If you want to create something which does everything for the user, then we probably need to mentally separate the library into two halves. One half is a lightweight, low level, API for doing independent tasks in isolation and another which builds on that API to create a complete, integrated solution for the user.

cfconrad commented 5 years ago

I really like, how you bring it down to the general purposes!

What I'm currently missing is, what do we consider as low level and were do we start with the upper layer/utilities.

Another thing on the table is, do we wont to extend the pytest framework. The idea behind this is, writing the tests like pytests/testinfra. Having the same syntax (test_*) and working with annotations to make terraform/qatrfm specific configuration of the "lower" api level. So in general every pytest may use that framework to write sophisticated tests which needs VMs or do we stick to our own test syntax.

richiejp commented 5 years ago

What I'm currently missing is, what do we consider as low level and were do we start with the upper layer/utilities.

I suppose this is relative and difficult to define. Probably something like 'wait_for_ip'.

Another thing on the table is, do we wont to extend the pytest framework.

I imagine the lower level API could be used with any test framework or none at all. You can simply deploy with terraform during the setup function etc. Then for the 'just works' high level API/framework you can experiment with various approaches.

jlausuch commented 5 years ago

Following the example of wait_for_ip(), would you leave this for the user to care about (with proper documentation of course)?

richiejp commented 5 years ago

Yeah. At least to begin with, then when I have 3 or more tests with a similar code pattern in I would put this into a higher level function for convenience.

jlausuch / qatrfm-lib

RFC: Remove config/default.tf and default parameters #24