turicas / rows

A common, beautiful interface to tabular data, no matter the format
GNU Lesser General Public License v3.0
867 stars 136 forks source link

ODS Plugin #7

Open turicas opened 10 years ago

turicas commented 10 years ago

Since an ODS is just a zip file with a XML an other meta-data files inside (and the spreadsheet data actually goes on the XML), we can use lxml (as we're already using it on plugin HTML) to deal with it.

There are two approaches, actually: 1- Use lxml (maybe slower, better to maintain and more accurate) 2- Use regular expressions (maybe faster, not so accurate and easy to maintain)

rossjones commented 9 years ago

There's a pretty messy implementation at https://github.com/okfn/messytables/blob/master/messytables/ods.py that might help as a starting point. It does manage to process the ODS files in https://github.com/okfn/messytables/tree/master/horror so it does work to some extent (even large.ods which extracts to ~98Mb).

turicas commented 9 years ago

@rossjones, thanks! I'm thinking in not using regular expressions as they use for this implementations (actually they use lxml + regexps) since it can lead to some problems/more complexity (although it'd be probably faster). I'll try to reuse "horror" files on some tests.

randomstuff commented 9 years ago

It might be useful to handle .lods as well (plain xml files without .zip archive).

turicas commented 8 years ago

@randomstuff, are .lods files equal to the content.xml inside the .zip archive?