willtrnr / pyxlsb

Excel 2007+ Binary Workbook (xlsb) reader for Python
GNU Lesser General Public License v3.0
90 stars 21 forks source link

can the worksheet be converted into csv without reading row by row? #15

Closed lbhtran closed 5 years ago

lbhtran commented 5 years ago

Hi, I work with xlsb with a large number of rows so I wonder if there is a way to avoid read data in rows by rows as it's taking a long time.

willtrnr commented 5 years ago

Given how the data it laid out in the xlsb file, reading row by row one way or an other is the only option.

I didn't get much time to work on it, but I'm generally trying to improve performance. After all, reading huge files was the original motivation for writing this.

lbhtran commented 5 years ago

I got a workaround which involves using Powershell to convert xlsb to csv then read into pandas dataframe. It's working for now but it would be nice to improve performance when reading files with pyxlsb

chfw commented 5 years ago

Pyexcel with pyexcel-xlsb uses pyxlsb can help the conversion to csv.

I suppose you would use pandas to read csv trunk by trunk.

willtrnr commented 5 years ago

I'll close this in favor of a general performance issue I've opened #16