pythonicrubyist / creek

Ruby library for parsing large Excel files.
http://rubygems.org/gems/creek
MIT License
388 stars 109 forks source link

Can't read an xlsx file created programmatically with Axlsx #59

Closed myfitment closed 6 years ago

myfitment commented 6 years ago

Any files that are created with the Axlsx gem can not be read. The data in the first worksheet appears to be fictional.

I have a spreadsheet filled with data, and a header in the first row going out to column JN2. However, creek reads this row as:

{"A2"=>nil, "B2"=>nil, "C2"=>nil, "D2"=>nil, "E2"=>11000.0, "F2"=>1.0, "G2"=>0.0, "H2"=>490.0, "I2"=>nil, "J2"=>nil, "K2"=>nil, "L2"=>nil, "M2"=>nil, "N2"=>1.0, "O2"=>4.0, "P2"=>nil, "Q2"=>nil, "R2"=>nil, "S2"=>nil, "T2"=>nil, "U2"=>nil, "V2"=>nil, "W2"=>4.0, "X2"=>nil, "Y2"=>nil, "Z2"=>8.0, "AA2"=>8.0, "AB2"=>8.0, "AC2"=>1.0, "AD2"=>nil, "AE2"=>nil, "AF2"=>33597.0, "AG2"=>nil, "AH2"=>588.0, "AI2"=>nil, "AJ2"=>32257.0, "AK2"=>nil, "AL2"=>nil, "AM2"=>nil, "AN2"=>nil, "AO2"=>nil, "AP2"=>0.0, "AQ2"=>nil, "AR2"=>nil, "AS2"=>nil, "AT2"=>nil, "AU2"=>nil, "AV2"=>nil, "AW2"=>nil, "AX2"=>nil, "AY2"=>nil, "AZ2"=>nil, "BA2"=>nil, "BB2"=>nil, "BC2"=>nil, "BD2"=>nil, "BE2"=>nil, "BF2"=>nil, "BG2"=>nil, "BH2"=>nil, "BI2"=>nil, "BJ2"=>nil, "BK2"=>nil, "BL2"=>nil, "BM2"=>nil, "BN2"=>nil, "BO2"=>nil, "BP2"=>nil, "BQ2"=>nil, "BR2"=>nil, "BS2"=>nil, "BT2"=>nil, "BU2"=>nil, "BV2"=>nil, "BW2"=>nil, "BX2"=>nil, "BY2"=>nil, "BZ2"=>nil, "CA2"=>nil, "CB2"=>nil, "CC2"=>nil, "CD2"=>nil, "CE2"=>nil, "CF2"=>nil, "CG2"=>nil, "CH2"=>nil, "CI2"=>nil, "CJ2"=>nil, "CK2"=>nil, "CL2"=>nil, "CM2"=>nil, "CN2"=>nil, "CO2"=>nil, "CP2"=>nil, "CQ2"=>nil, "CR2"=>nil, "CS2"=>nil, "CT2"=>nil, "CU2"=>nil, "CV2"=>nil, "CW2"=>nil, "CX2"=>nil, "CY2"=>nil, "CZ2"=>nil, "DA2"=>nil, "DB2"=>nil, "DC2"=>nil, "DD2"=>nil, "DE2"=>nil, "DF2"=>nil, "DG2"=>nil, "DH2"=>nil, "DI2"=>nil, "DJ2"=>nil, "DK2"=>nil, "DL2"=>nil, "DM2"=>nil, "DN2"=>nil, "DO2"=>nil, "DP2"=>nil, "DQ2"=>nil, "DR2"=>nil, "DS2"=>nil, "DT2"=>nil, "DU2"=>nil, "DV2"=>nil, "DW2"=>nil, "DX2"=>nil, "DY2"=>nil, "DZ2"=>nil, "EA2"=>nil, "EB2"=>nil, "EC2"=>nil, "ED2"=>nil, "EE2"=>nil, "EF2"=>nil, "EG2"=>nil, "EH2"=>nil, "EI2"=>nil, "EJ2"=>nil, "EK2"=>nil, "EL2"=>nil, "EM2"=>nil, "EN2"=>nil, "EO2"=>nil, "EP2"=>nil, "EQ2"=>nil, "ER2"=>nil, "ES2"=>nil, "ET2"=>nil, "EU2"=>nil, "EV2"=>nil, "EW2"=>nil, "EX2"=>nil, "EY2"=>nil, "EZ2"=>nil, "FA2"=>nil, "FB2"=>nil, "FC2"=>nil, "FD2"=>nil, "FE2"=>nil, "FF2"=>nil, "FG2"=>nil, "FH2"=>nil, "FI2"=>nil, "FJ2"=>nil, "FK2"=>nil, "FL2"=>nil, "FM2"=>nil, "FN2"=>nil, "FO2"=>nil, "FP2"=>nil, "FQ2"=>nil, "FR2"=>nil, "FS2"=>nil, "FT2"=>nil, "FU2"=>nil, "FV2"=>nil, "FW2"=>nil, "FX2"=>nil, "FY2"=>nil, "FZ2"=>nil, "GA2"=>nil, "GB2"=>nil, "GC2"=>nil, "GD2"=>nil, "GE2"=>nil, "GF2"=>nil, "GG2"=>nil, "GH2"=>nil, "GI2"=>nil, "GJ2"=>nil, "GK2"=>nil, "GL2"=>nil, "GM2"=>nil, "GN2"=>nil, "GO2"=>nil, "GP2"=>nil, "GQ2"=>nil, "GR2"=>nil, "GS2"=>nil, "GT2"=>nil, "GU2"=>nil, "GV2"=>nil, "GW2"=>nil, "GX2"=>nil, "GY2"=>nil, "GZ2"=>nil, "HA2"=>nil, "HB2"=>nil, "HC2"=>nil, "HD2"=>nil, "HE2"=>nil, "HF2"=>nil, "HG2"=>nil, "HH2"=>nil, "HI2"=>nil, "HJ2"=>nil, "HK2"=>nil, "HL2"=>nil, "HM2"=>nil, "HN2"=>nil, "HO2"=>nil, "HP2"=>nil, "HQ2"=>nil, "HR2"=>nil, "HS2"=>nil, "HT2"=>nil, "HU2"=>nil, "HV2"=>nil, "HW2"=>nil, "HX2"=>nil, "HY2"=>nil, "HZ2"=>nil, "IA2"=>nil, "IB2"=>nil, "IC2"=>nil, "ID2"=>nil, "IE2"=>nil, "IF2"=>nil, "IG2"=>nil, "IH2"=>nil, "II2"=>nil, "IJ2"=>nil, "IK2"=>nil, "IL2"=>nil, "IM2"=>nil, "IN2"=>nil, "IO2"=>nil, "IP2"=>nil, "IQ2"=>nil, "IR2"=>nil, "IS2"=>nil, "IT2"=>nil, "IU2"=>nil, "IV2"=>nil, "IW2"=>nil, "IX2"=>nil, "IY2"=>nil, "IZ2"=>nil, "JA2"=>nil, "JB2"=>nil, "JC2"=>nil, "JD2"=>nil, "JE2"=>nil, "JF2"=>nil, "JG2"=>nil, "JH2"=>nil, "JI2"=>nil, "JJ2"=>nil, "JK2"=>nil, "JL2"=>nil, "JM2"=>nil, "JN2"=>nil}

It's interesting that it does capture the correct number of columns but none of the nil or numeric values above have any basis in reality.

Viewing the file, in Excel, it looks like a normal file and functions normally.

If I first view the file in Excel and then save it, creek reads it successfully.

I can't figure out why creek can read the file after it has been manually opened and saved by Excel, but not before. After all, Excel is able to read it. Obviously there is some kind of state change in the file after Excel reads it, I just can't imagine what it could be.

AsWali commented 6 years ago

@myfitment I am running against the same problem. Any luck ?

myfitment commented 6 years ago

I switched to the Roo stream parser for XLSX and I'm not having the problem any more.

AsWali commented 6 years ago

@myfitment If you change the Zip::File.open(path) to Zip::File.open_buffer(path) You can read the output of the axlsx gem without first saving an excel file.

This would mean you have to overwrite the initialize of the gem but worth a try. Have you tried this by any chance ?

myfitment commented 6 years ago

I have not tried that. I wonder why it reads the file differently.

AsWali commented 6 years ago

@myfitment It works for me. And I believe creek and simple_xlsx_reader were the fastest gems for reading excels. Maybe still worth it to use one of these https://spin.atomicobject.com/2017/03/22/parsing-excel-files-ruby/

It reads the file correctly now for me.

pythonicrubyist commented 6 years ago

@myfitment I am not able to reproduce this issue. if you can share with me, a copy of an excel file that has this issue, I will try to fix it.

pythonicrubyist commented 6 years ago

Closing this issue since it does not seem to be reproducable.