refold-languages / community-projects

MIT License
2 stars 1 forks source link

[Empyrical] Speed up Hill1t processing with concurrency #5

Open Swiddis opened 2 years ago

Swiddis commented 2 years ago

Currently Hill1t is quite slow for processing large files or amounts of files. I'm sure that there are ways to do this algorithmically, but what I have in mind at the moment is to introduce more concurrent processing. Most likely this should be done with concurrent.futures (docs). For the moment I'll keep this issue restricted to introducing concurrency since that's a bit more actionable.

1over137 commented 2 years ago

If I were to do this, I would have simply used pandas along with one of the multiprocessed apply modules for it, like https://pypi.org/project/pandarallel/

Swiddis commented 2 years ago

That sounds promising, if nobody else picks this up then I'll look into using Pandas for it. Might also be worth introducing a more formal data science framework for other future projects since it's pretty heavily data-related.