perf: update Incremental Clustering to use pandas dataframe for data processing. - Githubissues

uw-ssec / MAWpy

Mobility Analysis Workflow in Python

https://mawpy.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

1 stars 2 forks source link

perf: update Incremental Clustering to use pandas dataframe for data processing. #26

Open anujsinha3 opened 2 weeks ago

anujsinha3 commented 2 weeks ago

Currently, the input file format for incremental clustering is a CSV file.

Instead of reading this data as a pandas data frame, each line in the CSV file is read separately, and the columns are accessed by integer-based indexing.

This leads to non-vectorizable code and additional nested loops that impact performance. Additionally, this makes the code cluttered and difficult to comprehend.

The task is:

[x] update the code to use the pandas dataframe as a standardized data format.
[x] allow vectorized operation on dataframe for improved performance.
[ ] add type hints for enhanced readibility
[ ] add user based indexing for faster operations

anujsinha3 commented 1 week ago

first pass complete