Open jouvin opened 5 months ago
Implementing a fix I uncovered something I consider as a flaw in the current implementation: to guess the VLAN ID, the interface name is took in priority to the device name. It is a problem if you remove the need for a .
before the VLAN ID because there is a chance, like in vlan0
example that you don't get the right VLAN ID (in the example it should be 123, as the device is eth0.123
). For some reasons, it seems the tests are passing but I'd suggest to take device first and the interface name if device doesn't provide the information. But it has the risk of being a backward incompatible change.... What's your feeling? Did I miss something?
When I did this for backward compatibility for network.pm, I followed this test file. https://github.com/quattor/configuration-modules-core/blob/master/ncm-network/src/test/resources/route_rule.pan#L36
so I think long as your new change still covers this? then it should be fine for backward compatibility with network.pm?
Due to legacy code we had internally, we only define vlan interface using interfacename.vlan_id, i.e eth0.123 format as part of the interface name when defining vlan iface. So long as this test passes, https://github.com/quattor/configuration-modules-core/blob/master/ncm-network/src/test/resources/nmstate_advance.pan#L30 we should be ok on our end. so LGTM but I will test your PR against our profiles later this week to confirm.
regards
To be clear, my PR currently only relax the pattern matching on interface/device name but doesn't implement my second suggestion of inverting the order in which we use interface name and device to guess the VLAN ID. As for us this second point is not very important because we always use the same value for the interface name and the device. But I have the feeling you would allow the interface name not to convey the VLAN ID but have it attached to the device name instead...
Anyway, with my fix, I'm a bit surprised that vlan0
test is still working... I should look in more details. But this fix has been deployed at IJCLab and fixes the problem we had before as our VLAN interfaces are all named vlanxxx
, xxx
being the VLAN ID.
@jouvin from what I see, vlan0 test config is working as expected with your change since you just relaxed it and not removed anything. vlan0 config is using device=eth0.123 to get the vlanid. (https://github.com/quattor/configuration-modules-core/blob/master/ncm-network/src/test/resources/nmstate_advance.pan#L40) so that is correct and I wouldn't have expected your change to break it? so IMO it is working as expected. Unless I'm misunderstanding the solution here.
I'm not fussed with order of check, as we are only setting it one way which is using the format "/system/network/interfaces/eth0.123"
so long as this format of config carries on working too, it should be fine.
I just did quick check on lab host and its a noop as expected.
@aka7 good! I'll see if I can come up with the reverse order giving the current result. For me it should...
My point with vlan0 is that with the relaxed regex, the ifnzme should match the regex and the vlan ID should be set to 0 which is not what we want. And for me the fix would be to reverse the order as the device will be carry the actual vlan0 ID and will be tested first. I guess it is working because ID 0 is causing the test to evaluate to false. My initial attempt to add vlan1 was failing in fact because it returned ID 1 instead of 123.
@aka7 @stdweird I'm trying to fix a problem I identified with the way returned VLAN ID from the interface/device name is tested. The fix is trivial but I'm trying to add a test for this and I'm having trouble because adding the needed lines in the test profile make all the tests failing (including the first one which is just compiling the profile if I'm right) with something that looks as a Pan error except that compiling the profile manually gives no error. Surely a trivial mistake of mine... Any idea?
@jouvin there is planned meetup discussion today, are you able to join todays call, perhaps we can discuss it there?
@aka7 good! I'll see if I can come up with the reverse order giving the current result. For me it should...
My point with vlan0 is that with the relaxed regex, the ifnzme should match the regex and the vlan ID should be set to 0 which > is not what we want. And for me the fix would be to reverse the order as the device will be carry the actual vlan0 ID and will be > tested first. I guess it is working because ID 0 is causing the test to evaluate to false. My initial attempt to add vlan1 was failing > in fact because it returned ID 1 instead of 123.
yes I just tried this on the test file, change interface name from vlan0 -> vlan1 and test fails. May need to look at regex, because interface name with vlan0 is ignored but anything >0 is used as vlan id, which is not what we want on interface name? so I dont think you want to relax the regex for the interface name.
@aka7 sorry, I'm at a conference this week, it was impossible for me to join the meetup... Don't spend time trying to fix tests, I have almost everything done but currently I only published the side issue I identified and I don't understand why adding a VLAN interface in the test profile beaks the Pan compilation despite Pan being able to compile it successfully if run manually...
@stdweird any chance you could have a look why the second commit is breaking all tests... I'm sure it's trivial and you'll identify the problem very quickly! Just my lack of Perl/Quattor development these last years!
@stdweird I'm still looking for some help to sort out my (probably stupid) mistake with the test additions, so that I can push the real fix related to the issue...
@jouvin sorry, i don't know where my head is last weeks ;) can you already rebase the PR?
@jouvin quick look at the code, but $iface->{device} || $name
is not the same as $iface->{device}
. i guess you probably want to pass the $name
by itself as 3rd argument and also use it to guess the vlanid
@stdweird don't worry, I'm totally overwhelmed too these days, tackling to many issues a the same time... My question was not really on the code itself (the line you mentioned is not from me and I was not completely convinced if was correct with Perl, look at a bashism...) but anyway my main problem at this stage is that I added one test in one of the .t
file and this breaks all tests. It has nothing to do with the module code itself: just removing the added lines in the test is enough to make all the other tests successful. I guess I made a very stupid mistake in my test additions.
As for your request to rebase, I'd prefer not to do it at this stage (as I have the whole fix ready for this version and would prefer to rebase afterwards), except if you have particular reason for asking for it...
hmmm, that is not the reason. if you can rebase, i can run the tests locally and debug it quicker
Sorry I missed the conflicts... I'll do the rebase then...
@stdweird rebase done
@jouvin hidden in the output is a
[ERROR] Filename /etc/nmstate/vlan.0.yml found that doesn't match the device regexp. Must be an error in ncm-network.
@jouvin the pan unittest is an actual example? what is the generated filename for that device? probably the generated yml filename also needs to follow the $device
logic you modified.
@stdweird yes sure, I need to fix the test for the modified logic but wanted to fix the test logic first... I apologise missing the error, I thought it looked every line... My bad! I'll check in the next days and try to complete this PR.
if you want to allow such names, just replace the whole device regex with \w+
; or implement logic to skip the test (maybe controlled by boolean from the schema. you can have it default to true (ie allow arbitrary names by default)
@stdweird yes it is what I have done in fact but I have not pushed it yet.. I wanted to add a test first demonstrated the problem with the vlan0
example which works because of a bug in the configuration module...
For my information, where have you seen this "hidden error"?
it's not hidden, it's just a single line of ERROR output; it's easily missed
@stdweird I think I understood why I was not seeing the error you mentioned. You need -Dprove.args
to see the error message from the configuration module (else I don't see any error related to the missing file) and the error I get is:
[ERROR] Filename /etc/nmstate/vlan.0.yml found that doesn't match the device regexp. Must be an error in ncm-network.
It seems that the component doesn't like the device name vlan.0
because there is no number after vlan
and before the .
. If using vlan0.0
, the problem disappears. The change made seems to cause a problem with the ib1.12345
test but it is something else and doesn't break all the other tests that receive undef
instead of the expected yml file contents...
I'll check if there is a good reason for such a restriction...
And it is unexpected for me. This error comes from ncm-network
and I thought it was not used at all when using ncm-nmstate
. I may have missed something...
@stdweird I found it but is it expected that we execute some ncm-network code when using ncm-nmstate?
@jouvin yes, that is normal. the nmstate backend is a subclass of the network backend
Fixes #1678