Currently, the input file format for incremental clustering is a CSV file.
Instead of reading this data as a pandas data frame, each line in the CSV file is read separately, and the columns are accessed by integer-based indexing.
This leads to non-vectorizable code and additional nested loops that impact performance. Additionally, this makes the code cluttered and difficult to comprehend.
The task is:
[x] update the code to use the pandas dataframe as a standardized data format.
[x] allow vectorized operation on dataframe for improved performance.
Currently, the input file format for incremental clustering is a CSV file.
Instead of reading this data as a pandas data frame, each line in the CSV file is read separately, and the columns are accessed by integer-based indexing.
This leads to non-vectorizable code and additional nested loops that impact performance. Additionally, this makes the code cluttered and difficult to comprehend.
The task is: