rhtools / vminfo-parser

3 stars 4 forks source link

OS Information needs to pull from VMWare Tools #8

Closed stratus-ss closed 1 month ago

stratus-ss commented 1 month ago

As a beta version, the OS version is being derived from the configuration metadata on the VM. This was due to the potential high number of vms with no VMWare tools.

This request/fix is to default to pulling the values of the OS from VMWare Tools and then falling back to another column (such as the configuration metadata on the VM) if the VMWare Tools output is blank

stratus-ss commented 1 month ago

One way this could be accomplished is to use a function to sort around whether or not a column is empty

    def _get_os_value(self, row):
        vmware_tools_column = self.column_headers["operatingSystem"]
        config_column = self.column_headers.get("operatingSystemFromConfig")

        if pd.notna(row[vmware_tools_column]):
            return row[vmware_tools_column]
        elif config_column in row and pd.notna(row[config_column]):
            return row[config_column]
        else:
            return ''

The start of the next function could look like this

def add_extra_columns(self: t.Self) -> None:
        os_column_to_parse = "temp_os"
        # Create a temporary column to store the combined OS information
        self.df[os_column_to_parse] = self.df.apply(self._get_os_value, axis=1)

This refactoring needs to have some adjustments elsewhere in add_extra_columns

           self.df[["OS Name", "OS Version", "Architecture"]] = self.df[os_column_to_parse].str.extract(exclude_windows_pattern)
            self.df[windows_server_columns] = self.df[os_column_to_parse].str.extract(windows_server_pattern)
            self.df[windows_desktop_columns] = self.df[os_column_to_parse].str.extract(

And then we need to drop the new column

            self.df.drop(
                windows_server_columns + windows_desktop_columns,
                windows_server_columns + windows_desktop_columns + [os_column_to_parse],
                axis=1,
                axis=1,
                inplace=True,
                inplace=True,
            )
stratus-ss commented 1 month ago

another option is to use the same where pattern as the windows versions:

self.df["OS Name"] = self.df["Server OS Name"].where(self.df["OS Name"].isnull(), self.df["OS Name"])
stratus-ss commented 1 month ago

@csfreak do you think it is reasonable to change the column_headers in constant such that it has "operatingSystemFromVMTools" and "operatingSystemFromVMConfig" in order to handle this? The goal is to check the vmtools column first and fall back to the VM Config if vmware tools is empty.

The where statement would be similar to

self.df[const.EXTRA_COLUMNS_DEST[0]] = self.df[const.COLUMN_HEADERS.operatingSystemFromVMConfig].where(self.df[const.COLUMN_HEADERS.operatingSysteoperatingSystemFromVMTools].isnull(), self.df[const.EXTRA_COLUMNS_DEST[0]])

I dont know if this is reasonable though