The sanitize_customer_list function processes a transposed list of customer data to clean and filter out irrelevant or noisy data. The function performs several tasks to ensure the data is suitable for further processing and analysis.
The function initializes an English word dictionary using NLTK's words corpus. This dictionary is used to filter out common English words that may not be relevant for the analysis. The function extracts headers and data from the input customer_list. Headers are extracted as the first row, while the remaining rows represent data columns.
The function iterates through each column of data, removing irrelevant keywords and special characters. It also filters out IP addresses, MAC addresses, and integers, as these might be considered noise in the context of camera model names.
Each value is processed to remove keywords from both the provided dictionary and the English dictionary.
The function identifies and removes columns that contain IP or MAC addresses, which are deemed irrelevant for the current analysis.
The sanitized data is returned as a list of lists, where each inner list represents a cleaned column of data. The headers and data are combined to form the final output.
Additional Functionality
[x] New feature (non-breaking change which adds functionality)
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Sanitize Customer Columns
The sanitize_customer_list function processes a transposed list of customer data to clean and filter out irrelevant or noisy data. The function performs several tasks to ensure the data is suitable for further processing and analysis.
The function initializes an English word dictionary using NLTK's words corpus. This dictionary is used to filter out common English words that may not be relevant for the analysis. The function extracts headers and data from the input customer_list. Headers are extracted as the first row, while the remaining rows represent data columns.
The function iterates through each column of data, removing irrelevant keywords and special characters. It also filters out IP addresses, MAC addresses, and integers, as these might be considered noise in the context of camera model names. Each value is processed to remove keywords from both the provided dictionary and the English dictionary.
The function identifies and removes columns that contain IP or MAC addresses, which are deemed irrelevant for the current analysis.
The sanitized data is returned as a list of lists, where each inner list represents a cleaned column of data. The headers and data are combined to form the final output.
Additional Functionality
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Checklist