Closed biocyberman closed 5 years ago
I actually didn't know about numba until now, but considering how I'm seeking in the data file to read the rows I really don't expect that paralleling with it will work properly.
I'd like to allow direct cell addressing by memory mapping the file and indexing some regions which might help with performance and multi-thread scenarios.
Also, reading to a pandas DF seems like a very obvious use case and I'd like to look into how I can support that directly in #12.
Great to hear about plan for pandas support.
There are several level of utilizing numba: compiled python function -> cpu parallelization -> GPU parallelization. I came to know about numba while I was working with CUDA. I only tried with number and matrix manipulation. So I don't know it works for other kind of data. Probably Cython is better fit.
Closing this as a duplicate of #12
I converting an XLSB file with size about 150MB. It takes more than 20 minutes to complete. It's too long for me. How do I speed up? I tried with numba but it did not work, probably due to the mixture of texts and numbers in my file? Is it known that pyxlsb works with numba during the reading of Excel rows?
What I am after is a fast way to read a XLSB file to a Pandas dataframe.
Here is my current code.