Open Swiddis opened 2 years ago
If I were to do this, I would have simply used pandas along with one of the multiprocessed apply modules for it, like https://pypi.org/project/pandarallel/
That sounds promising, if nobody else picks this up then I'll look into using Pandas for it. Might also be worth introducing a more formal data science framework for other future projects since it's pretty heavily data-related.
Currently Hill1t is quite slow for processing large files or amounts of files. I'm sure that there are ways to do this algorithmically, but what I have in mind at the moment is to introduce more concurrent processing. Most likely this should be done with
concurrent.futures
(docs). For the moment I'll keep this issue restricted to introducing concurrency since that's a bit more actionable.