quattor / configuration-modules-core

Node Configuration Manager Components for Everyone
www.quattor.org
Other
6 stars 54 forks source link

ncm-nmstate: relax syntax of VLAN interface names #1679

Open jouvin opened 5 months ago

jouvin commented 5 months ago

Fixes #1678

jouvin commented 4 months ago

Implementing a fix I uncovered something I consider as a flaw in the current implementation: to guess the VLAN ID, the interface name is took in priority to the device name. It is a problem if you remove the need for a . before the VLAN ID because there is a chance, like in vlan0 example that you don't get the right VLAN ID (in the example it should be 123, as the device is eth0.123). For some reasons, it seems the tests are passing but I'd suggest to take device first and the interface name if device doesn't provide the information. But it has the risk of being a backward incompatible change.... What's your feeling? Did I miss something?

aka7 commented 4 months ago

When I did this for backward compatibility for network.pm, I followed this test file. https://github.com/quattor/configuration-modules-core/blob/master/ncm-network/src/test/resources/route_rule.pan#L36

so I think long as your new change still covers this? then it should be fine for backward compatibility with network.pm?

Due to legacy code we had internally, we only define vlan interface using interfacename.vlan_id, i.e eth0.123 format as part of the interface name when defining vlan iface. So long as this test passes, https://github.com/quattor/configuration-modules-core/blob/master/ncm-network/src/test/resources/nmstate_advance.pan#L30 we should be ok on our end. so LGTM but I will test your PR against our profiles later this week to confirm.

regards

jouvin commented 4 months ago

To be clear, my PR currently only relax the pattern matching on interface/device name but doesn't implement my second suggestion of inverting the order in which we use interface name and device to guess the VLAN ID. As for us this second point is not very important because we always use the same value for the interface name and the device. But I have the feeling you would allow the interface name not to convey the VLAN ID but have it attached to the device name instead...

Anyway, with my fix, I'm a bit surprised that vlan0 test is still working... I should look in more details. But this fix has been deployed at IJCLab and fixes the problem we had before as our VLAN interfaces are all named vlanxxx, xxx being the VLAN ID.

aka7 commented 4 months ago

@jouvin from what I see, vlan0 test config is working as expected with your change since you just relaxed it and not removed anything. vlan0 config is using device=eth0.123 to get the vlanid. (https://github.com/quattor/configuration-modules-core/blob/master/ncm-network/src/test/resources/nmstate_advance.pan#L40) so that is correct and I wouldn't have expected your change to break it? so IMO it is working as expected. Unless I'm misunderstanding the solution here.

I'm not fussed with order of check, as we are only setting it one way which is using the format "/system/network/interfaces/eth0.123"

so long as this format of config carries on working too, it should be fine.

I just did quick check on lab host and its a noop as expected.

jouvin commented 4 months ago

@aka7 good! I'll see if I can come up with the reverse order giving the current result. For me it should...

My point with vlan0 is that with the relaxed regex, the ifnzme should match the regex and the vlan ID should be set to 0 which is not what we want. And for me the fix would be to reverse the order as the device will be carry the actual vlan0 ID and will be tested first. I guess it is working because ID 0 is causing the test to evaluate to false. My initial attempt to add vlan1 was failing in fact because it returned ID 1 instead of 123.

jouvin commented 4 months ago

@aka7 @stdweird I'm trying to fix a problem I identified with the way returned VLAN ID from the interface/device name is tested. The fix is trivial but I'm trying to add a test for this and I'm having trouble because adding the needed lines in the test profile make all the tests failing (including the first one which is just compiling the profile if I'm right) with something that looks as a Pan error except that compiling the profile manually gives no error. Surely a trivial mistake of mine... Any idea?

aka7 commented 4 months ago

@jouvin there is planned meetup discussion today, are you able to join todays call, perhaps we can discuss it there?

aka7 commented 4 months ago

@aka7 good! I'll see if I can come up with the reverse order giving the current result. For me it should...

My point with vlan0 is that with the relaxed regex, the ifnzme should match the regex and the vlan ID should be set to 0 which > is not what we want. And for me the fix would be to reverse the order as the device will be carry the actual vlan0 ID and will be > tested first. I guess it is working because ID 0 is causing the test to evaluate to false. My initial attempt to add vlan1 was failing > in fact because it returned ID 1 instead of 123.

yes I just tried this on the test file, change interface name from vlan0 -> vlan1 and test fails. May need to look at regex, because interface name with vlan0 is ignored but anything >0 is used as vlan id, which is not what we want on interface name? so I dont think you want to relax the regex for the interface name.

jouvin commented 4 months ago

@aka7 sorry, I'm at a conference this week, it was impossible for me to join the meetup... Don't spend time trying to fix tests, I have almost everything done but currently I only published the side issue I identified and I don't understand why adding a VLAN interface in the test profile beaks the Pan compilation despite Pan being able to compile it successfully if run manually...

jouvin commented 4 months ago

@stdweird any chance you could have a look why the second commit is breaking all tests... I'm sure it's trivial and you'll identify the problem very quickly! Just my lack of Perl/Quattor development these last years!

jouvin commented 4 months ago

@stdweird I'm still looking for some help to sort out my (probably stupid) mistake with the test additions, so that I can push the real fix related to the issue...

stdweird commented 4 months ago

@jouvin sorry, i don't know where my head is last weeks ;) can you already rebase the PR?

stdweird commented 4 months ago

@jouvin quick look at the code, but $iface->{device} || $name is not the same as $iface->{device}. i guess you probably want to pass the $name by itself as 3rd argument and also use it to guess the vlanid

jouvin commented 4 months ago

@stdweird don't worry, I'm totally overwhelmed too these days, tackling to many issues a the same time... My question was not really on the code itself (the line you mentioned is not from me and I was not completely convinced if was correct with Perl, look at a bashism...) but anyway my main problem at this stage is that I added one test in one of the .t file and this breaks all tests. It has nothing to do with the module code itself: just removing the added lines in the test is enough to make all the other tests successful. I guess I made a very stupid mistake in my test additions.

As for your request to rebase, I'd prefer not to do it at this stage (as I have the whole fix ready for this version and would prefer to rebase afterwards), except if you have particular reason for asking for it...

stdweird commented 4 months ago

hmmm, that is not the reason. if you can rebase, i can run the tests locally and debug it quicker

jouvin commented 4 months ago

Sorry I missed the conflicts... I'll do the rebase then...

jouvin commented 4 months ago

@stdweird rebase done

stdweird commented 4 months ago

@jouvin hidden in the output is a

[ERROR] Filename /etc/nmstate/vlan.0.yml found that doesn't match the device regexp. Must be an error in ncm-network.
stdweird commented 4 months ago

@jouvin the pan unittest is an actual example? what is the generated filename for that device? probably the generated yml filename also needs to follow the $device logic you modified.

jouvin commented 4 months ago

@stdweird yes sure, I need to fix the test for the modified logic but wanted to fix the test logic first... I apologise missing the error, I thought it looked every line... My bad! I'll check in the next days and try to complete this PR.

stdweird commented 4 months ago

if you want to allow such names, just replace the whole device regex with \w+; or implement logic to skip the test (maybe controlled by boolean from the schema. you can have it default to true (ie allow arbitrary names by default)

jouvin commented 4 months ago

@stdweird yes it is what I have done in fact but I have not pushed it yet.. I wanted to add a test first demonstrated the problem with the vlan0 example which works because of a bug in the configuration module...

For my information, where have you seen this "hidden error"?

stdweird commented 4 months ago

it's not hidden, it's just a single line of ERROR output; it's easily missed

jouvin commented 4 months ago

@stdweird I think I understood why I was not seeing the error you mentioned. You need -Dprove.args to see the error message from the configuration module (else I don't see any error related to the missing file) and the error I get is:

[ERROR] Filename /etc/nmstate/vlan.0.yml found that doesn't match the device regexp. Must be an error in ncm-network.

It seems that the component doesn't like the device name vlan.0 because there is no number after vlan and before the .. If using vlan0.0, the problem disappears. The change made seems to cause a problem with the ib1.12345 test but it is something else and doesn't break all the other tests that receive undef instead of the expected yml file contents...

I'll check if there is a good reason for such a restriction...

jouvin commented 4 months ago

And it is unexpected for me. This error comes from ncm-network and I thought it was not used at all when using ncm-nmstate. I may have missed something...

stdweird commented 4 months ago

https://github.com/quattor/configuration-modules-core/blob/master/ncm-network/src/main/perl/network.pm#L156 is the regex.

jouvin commented 4 months ago

@stdweird I found it but is it expected that we execute some ncm-network code when using ncm-nmstate?

stdweird commented 2 months ago

@jouvin yes, that is normal. the nmstate backend is a subclass of the network backend