Closed davidcassany closed 2 years ago
For the moment lets hide them but the problem is, we cannot differentiate between a real error (wrong file format, wrong yaml, wrong cloud-init) and a "fake" error (tried to load files from a dir, that dir doesnt exists) which are the current errors that we see always.
Maybe we could match the error text, but that would also potentially hide a real problem (like I load cloud-config from /test1 but the real dir is /test).
Kind of difficult :/
indeed, matching error text is quite problematic, I would go for something simple now and lower those messages to debug log level when there are ONLY partial YAML unmarshal errors in the stack ( that is, errors that are 'half', as something was parsed at the end, which is our case here). that way if something is smelly we can always turn debug on and still see the full picture, and if there are real errors beside partial unmarshal error we display all of them as before.
Ok, having a closer look at this it is not only parsing cmdline the issue. Some of the errors seems to come from the cloudconfig while trying to read the cloud init path arguments as YAML (so, cmdline is not involved):
time="2022-03-04T11:40:26+01:00" level=warning msg="9 errors occurred:\n\t* yaml: unmarshal errors:\n line 1:
cannot unmarshal !!str `/system...` into schema.YipConfig\n\t* yaml: unmarshal errors:\n line 1: cannot unmars
hal !!str `/oem/` into schema.YipConfig\n\t* yaml: unmarshal errors:\n line 1: cannot unmarshal !!str `/usr/lo
...` into schema.YipConfig\n\t* yaml: unmarshal errors:\n line 1: cannot unmarshal !!str `/system...` into sch
ema.YipConfig\n\t* yaml: unmarshal errors:\n line 1: cannot unmarshal !!str `/oem/` into schema.YipConfig\n\t*
yaml: unmarshal errors:\n line 1: cannot unmarshal !!str `/usr/lo...` into schema.YipConfig\n\t* yaml: unmars
hal errors:\n line 1: cannot unmarshal !!str `/system...` into schema.YipConfig\n\t* yaml: unmarshal errors:\n
line 1: cannot unmarshal !!str `/oem/` into schema.YipConfig\n\t* yaml: unmarshal errors:\n line 1: cannot u
nmarshal !!str `/usr/lo...` into schema.YipConfig\n\n"
It looks like somehow we are feeding the cloud init runner with strings which are the cloud init paths, too.
I think we discussed this already, this is because yip tries to parse whatever is passed (string) as a file or directory and if its not either, it will go and feed it directly to yip as if it was a cloud-config in a string, thus it fails to parse it and givers that horrible error.
basically we are passing the default dirs ("/system/oem", "/oem/", "/usr/local/cloud-config/") and if they dont exists, yip tries to parse the string as a cloud config and it fails to unmarshall it (well, its not valid yaml so it makes sense)
The problem is, that any of those dirs can be created dynamically by a cloud config so we cannot check them beforehand (i.e. /oem/dirs.yaml creates /usr/local/cloud-config files, which then yip parses) so we cannot check if the dirs exists before passing them to yip (I tried that already, it failed to find the newly created config files), hence my recommendation that this check is done on the yip part that tries to identify what to do with the given string to try to be more intelligent here https://github.com/mudler/yip/blob/master/pkg/executor/default.go#L102
I think we discussed this already, this is because yip tries to parse whatever is passed (string) as a file or directory and if its not either, it will go and feed it directly to yip as if it was a cloud-config in a string, thus it fails to parse it and givers that horrible error.
basically we are passing the default dirs ("/system/oem", "/oem/", "/usr/local/cloud-config/") and if they dont exists, yip tries to parse the string as a cloud config and it fails to unmarshall it (well, its not valid yaml so it makes sense)
Oh right! that is fine then! I was just having this while working on tests in https://github.com/rancher-sandbox/elemental/pull/125, that means my tests needs to be a bit more specific and mine was a red-herring :)
The problem is, that any of those dirs can be created dynamically by a cloud config so we cannot check them beforehand (i.e. /oem/dirs.yaml creates /usr/local/cloud-config files, which then yip parses) so we cannot check if the dirs exists before passing them to yip (I tried that already, it failed to find the newly created config files), hence my recommendation that this check is done on the yip part that tries to identify what to do with the given string to try to be more intelligent here https://github.com/mudler/yip/blob/master/pkg/executor/default.go#L102
Any idea? besides checking if the path actually exists or not in yip I don't see how we can make it more smarter to accommodate all the use-case at hand here. I would go over just to have proper tests with the cloud-init path specified created in the fs.
At the end, if we are not able to tell if the dir exists or not in runtime before calling the cloud-init wouldn't make difference, as the cloud runner would have the same inefficiencies and would try to do specific custom logics that we can do as well prior before calling it
the /oem, /system/oem folder should be there by default, but it's the installer duty after creating COS_PERSISTENT to create /usr/local/cloud-config
too
I dunno, seems a bit difficult to identify if its just a string or a broken yaml. Like we could try to parse the string as a dir to see if its constructed like a dir and then use that as identification mechanism. It it looks like a duck and quacks like a duck... a broken yaml should not look like a dir...but Im not sure we can do that? if there is something like a parseURL
for dirs?
Another thing would be to just call yip for each dir instead of passing all the dirs at the same time, like we do now[0]. That would allow us to check just before running if the dir exists, and skip if it doesnt. It means that we are a bit more slower maybe due to object initialization and so on? And each dir runs 3 times (pre, post, stage) so maybe we incur into some slowness there but I dont think it would be that bad.
That way if some cloud-init creates a given dir, we can check that just in time before running.
yeah, that should work, not sure why I didn't do that.....
[0] https://github.com/rancher-sandbox/elemental/blob/main/pkg/utils/runstage.go#L76
well, I'm thinking that maybe a simple solution would be to not try run cloud init files in path that don't exist. If cloud init is populating a path, should be at least relative to where cloud init files are, which is the case for /oem/userdata.
Problem would be if we populate something outside from it while declaring it in cloud init paths as well. at that point we should assume that at least the dir have to be created first. We could also make sure that simply all the dirs listed as cloud-init path at least exists, and if not we create them beforeahead.
For the moment I'm just accomodating tests to see if the error would disappear when the paths are created
Another thing would be to just call yip for each dir instead of passing all the dirs at the same time, like we do now[0]. That would allow us to check just before running if the dir exists, and skip if it doesnt. It means that we are a bit more slower maybe due to object initialization and so on? And each dir runs 3 times (pre, post, stage) so maybe we incur into some slowness there but I dont think it would be that bad.
That way if some cloud-init creates a given dir, we can check that just in time before running.
yeah, that should work, not sure why I didn't do that.....
[0] https://github.com/rancher-sandbox/elemental/blob/main/pkg/utils/runstage.go#L76
That wouldn't spare you from dirs that don't exists and they would be populated from cloud-init. You would be sensible to the order of the cloud-init paths, because you would cycle all of them 1-by-1, but you cannot know which one would create another one
yeah that is true :/
woudl it be possible to let yip know in advance which type of target are we passing? Like a modifier? That way we can pass the paths and let it know, hey this are exclusively paths, so we dont get it trying to parse it itslef?
Problem would be if we populate something outside from it while declaring it in cloud init paths as well. at that point we should assume that at least the dir have to be created first. We could also make sure that simply all the dirs listed as cloud-init path at least exists, and if not we create them beforeahead.
That might not be a clean approach, but should work. Simply calling MakeDirAll
on each path before starting the runner
woudl it be possible to let yip know in advance which type of target are we passing? Like a modifier? That way we can pass the paths and let it know, hey this are exclusively paths, so we dont get it trying to parse it itslef?
I'm not sure if that would end up biting up us again... I thought at exposing schema.Loaders or playing with Modifier to do some detection and such, but at the end I think the problem would still persist. Even if you have an endpoint to specify dirs to pull from, if those dirs don't exists, how should we care about the error from non existing path? we can ignore it for sure, but would just become another error to ignore, as the ones we are trying to get rid above
I'm not sure if that would end up biting up us again... I thought at exposing schema.Loaders or playing with Modifier to do some detection and such, but at the end I think the problem would still persist. Even if you have an endpoint to specify dirs to pull from, if those dirs don't exists, how should we care about the error from non existing path? we can ignore it for sure, but would just become another error to ignore, as the ones we are trying to get rid above
Sure but its an error that we can easily identify and its different from a generic yaml unmarshall error no? Like we pass a dir, get an error and check the error.
Anyway, would it be possible also to get our custom executor that calls the default executor underneath but overrides the runStage func to avoid checking for dir/file/etc?
That might not be a clean approach, but should work. Simply calling MakeDirAll on each path before starting the runner
As usual the most straightforward fix is usually the correct one LMAO
I'm not sure if that would end up biting up us again... I thought at exposing schema.Loaders or playing with Modifier to do some detection and such, but at the end I think the problem would still persist. Even if you have an endpoint to specify dirs to pull from, if those dirs don't exists, how should we care about the error from non existing path? we can ignore it for sure, but would just become another error to ignore, as the ones we are trying to get rid above
Sure but its an error that we can easily identify and its different from a generic yaml unmarshall error no? Like we pass a dir, get an error and check the error.
Anyway, would it be possible also to get our custom executor that calls the default executor underneath but overrides the runStage func to avoid checking for dir/file/etc?
Totally doable, the Executor Run
is actually just syntax sugar around Apply
https://github.com/mudler/yip/blob/master/pkg/executor/default.go#L85, which actually applies a cloud config to the system, so run it actually just have the logics to switch based on the input (string, dir, etc), which can be totally bypassed by just using Apply
That might not be a clean approach, but should work. Simply calling MakeDirAll on each path before starting the runner
As usual the most straightforward fix is usually the correct one LMAO
:rofl:
This is related to the
yaml: unmarshal errors: line1: cannot unmarshal !!str /usr/lo... into schema.YipConfig
messaged we see on elemental during the installation. Despite they are not relevant in that case and they can be ignored, they really give the impression something went wrong. We should hide them as debug messages or find some smarter criteria to weather show them or not.This will be a problem as soon as harvester starts using the new elemental installer, since harvester installer interface has a frame where the stdout of the installation is shown and these are almost the last messages shown while the harvester installer is stuck downloading some harvester from registries. It really gives the impression something is broken 😓
or