ms1450 / CommandConnectorCompatibilityCalculator

Compares a customer list of Camera Models with the Verkada HCL to provide compatible cameras.
2 stars 1 forks source link

Sanitizing the Customer List Before Identifying the Model Column #8

Closed ms1450 closed 3 months ago

ms1450 commented 3 months ago

Sanitize Customer Columns

The sanitize_customer_list function processes a transposed list of customer data to clean and filter out irrelevant or noisy data. The function performs several tasks to ensure the data is suitable for further processing and analysis.

The function initializes an English word dictionary using NLTK's words corpus. This dictionary is used to filter out common English words that may not be relevant for the analysis. The function extracts headers and data from the input customer_list. Headers are extracted as the first row, while the remaining rows represent data columns.

The function iterates through each column of data, removing irrelevant keywords and special characters. It also filters out IP addresses, MAC addresses, and integers, as these might be considered noise in the context of camera model names. Each value is processed to remove keywords from both the provided dictionary and the English dictionary.

The function identifies and removes columns that contain IP or MAC addresses, which are deemed irrelevant for the current analysis.

The sanitized data is returned as a list of lists, where each inner list represents a cleaned column of data. The headers and data are combined to form the final output.

Additional Functionality

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Checklist