roo-rb / roo

Roo provides an interface to spreadsheets of several sorts.
MIT License
2.79k stars 502 forks source link

Wrong type casting google docs #498

Open rockandruby opened 5 years ago

rockandruby commented 5 years ago

When I upload google docs table in xlsx, I get 1920.0 instead 1920. Tested on latest gem version. Also if I save this document locallyt(without any modification, just ctrl+s) I get correct value X. Any ideas?

rockandruby commented 5 years ago

seems the issue in nokogiri, because when I invoke extract_cells I already have wrong values

chopraanmol1 commented 5 years ago

@rockandruby nokogiri doesn't typecast the value. Roo tries to typecast in following method: https://github.com/roo-rb/roo/blob/a9026fdceebd06ee113bc2bb02f6b0542bc159b3/lib/roo/excelx/cell/number.rb#L19-L29

Please share a sample file which reproduce this issue.

rockandruby commented 5 years ago

@chopraanmol1 Yes, roo tries to type cast, but it gets data from nokogiri and in that method wrong data is already present(e.g. 1.0) and type casting goes wrong because number.include?('.') == true. In extract_cells method if you type cell_xml.content u'll get float instead int. Sample file

chopraanmol1 commented 5 years ago

@rockandruby I'm unable to download the sample file with the given link. Can you upload as a comment itself?

I don't think it is an issue with nokogiri gem. You should try reading raw XML file It will also probably have float value. A lot of data like integer, float, date, date-time, boolean are natively stored as a number in the raw XML file and respective excel client consider cell format to determine how to display or in our scenario typecast/convert the value.

I think the issue is with roo not recognizing the particular cell format being used in your sample file. So If you can upload a sample file (as a comment) it will be easy to debug this issue further

rockandruby commented 5 years ago

@chopraanmol1 Yes, I investigated a bit more and found out that a raw xml file with wrong type casting is given to nokogiri and nokogiri parses it. So we need to figure out why roo creates wrong xml file with numbers and type casts them to float for google docs. Below I attached that xml file. Pay attention to D and E columns. In my xlsx file they're integers but in xml for some reason they're parsed to float. `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

012345678910111213141516171819202122231.02019.0242122252.02019.0262122273.02019.0282122294.02019.0302122315.02019.0322122336.02019.034`
rockandruby commented 5 years ago

@chopraanmol1 any suggestions?

chopraanmol1 commented 5 years ago

@rockandruby roo doesn't create an XML file by itself rather it extract xml(s) from xlsx file which in this case already have float representation of the integer. Most of Excel client doesn't represent an integer as float while saving the spreadsheet but it seems Google-Spreadsheet is doing that anyway.

It generally still work fines as while reading the xlsx file excel client uses the format of the respective cell to determine how to display the data. In roo's scenario, you can use formatted_value to do the same, which in the current example will return String wrapped Integer value.

We cannot use format to calculate a cell's value as it may result in data loss and will be considered breaking change.

Implementing solution for this problem (converting all N.0 (float) => N (Integer)) is quite simple, but the problem is that other user may expect the current result.

I need to think a bit more on this if there is any safe way to typecast without breaking existing scenario Or whether it won't be a breaking change to begin with.

rockandruby commented 5 years ago

@rockandruby thanks for reply. Already did monkey patching)