Open turicas opened 10 years ago
There's a pretty messy implementation at https://github.com/okfn/messytables/blob/master/messytables/ods.py that might help as a starting point. It does manage to process the ODS files in https://github.com/okfn/messytables/tree/master/horror so it does work to some extent (even large.ods which extracts to ~98Mb).
@rossjones, thanks! I'm thinking in not using regular expressions as they use for this implementations (actually they use lxml + regexps) since it can lead to some problems/more complexity (although it'd be probably faster). I'll try to reuse "horror" files on some tests.
It might be useful to handle .lods
as well (plain xml files without .zip
archive).
@randomstuff, are .lods
files equal to the content.xml
inside the .zip
archive?
Since an ODS is just a zip file with a XML an other meta-data files inside (and the spreadsheet data actually goes on the XML), we can use
lxml
(as we're already using it on plugin HTML) to deal with it.There are two approaches, actually: 1- Use
lxml
(maybe slower, better to maintain and more accurate) 2- Use regular expressions (maybe faster, not so accurate and easy to maintain)