thombashi / sqlitebiter

A CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.
https://sqlitebiter.rtfd.io/
MIT License
850 stars 50 forks source link

XLS import FAILs when a header cell is blank #106

Open JBThiel opened 8 months ago

JBThiel commented 8 months ago

Seeing 2 types of failures, as follows: This error may also apply to XLSX files. However, I'm getting another failure with XLSX, will report separately, which may be shadowing this one, so I can't test this with XLSX exactly. ie. even when I have fixed all the headers and the XLS is working with sqlitebiter, if saved as XLSX the conversion fails, with a different error.

1) A simple table just fails with: [INFO] converted results: source=0, fail=1

Reproduce: Make in Gnumeric a simple table 3 cols, 2 rows B C a 1 2 (note there is no "A" header in topleft cell, it's empty)

SaveAs Excel 95/97 to demo.xls sqlitebinder file -f excel demo.xls FAILS with: [INFO] converted results: source=0, fail=1

If you put an "A" in 1,1 then it works. It also fails if you have A _ C (ie. B header blank) etc. Apparently any empty header cell.

By contrast, sqlitebiter does accept HTML tables with empty headers, and auto-assigns them names A,B,C... It should do the same for XLS.

2) On a more complex table, I am getting a crash, backtrace as below, which appears to relate the same problem. If I fixup the table replacing all the empty headers to "A", then sqlitebiter processes ok.
(also, I found the 2-column table with "Languages supported" seems to work even if the A cell is empty -- confusing) Note Gnumeric names the sheets with html angle-bracket strings it picks up -- I don't think those are the cause, I still see the crash renaming all sheets to "sNN". Reproduce by:

wget https://en.wikipedia.org/wiki/Comparison_of_text_editors
Gnumeric Data.Import as HTML, it loads 12 sheets.
  Most of the sheets have missing column1 header cells.
Gnumeric.SaveAs  Excel 95/97 (XLS), or 2007/2010 (XLSX) --> demo.xls
sqlitebiter file -f excel demo.xls
$ sqlitebiter file -f excel Comparison_of_text_editors-fixed-excel97.xls
[INFO] convert 'Comparison_of_text_editors-fixed-excel97.xls' to 'List_of_text_editors' table
Traceback (most recent call last):
  File "/usr/local/src/sqlitebiter.git/.venv/bin/sqlitebiter", line 8, in <module>
    sys.exit(cmd())
             ^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/sqlitebiter/__main__.py", line 363, in file
    converter.convert(file_path)
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/sqlitebiter/converter/_file.py", line 106, in convert
    self.__convert(fpath, source_info_record_base)
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/sqlitebiter/converter/_file.py", line 140, in __convert
    sqlite_tabledata = self.normalize_table(table_data)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/sqlitebiter/converter/_base.py", line 120, in normalize_table
    ).normalize()
      ^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/tabledata/normalizer.py", line 63, in normalize
    normalize_headers = self._normalize_headers()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/simplesqlite/_sanitizer.py", line 149, in _normalize_headers
    attr_name_list = AttrList.sanitize(super()._normalize_headers())
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/tabledata/normalizer.py", line 167, in _normalize_headers
    header = self._preprocess_header(col_idx, header)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/src/sqlitebiter.git/.venv/lib/python3.11/site-packages/simplesqlite/_sanitizer.py", line 112, in _preprocess_header
    assert header
AssertionError