Open turicas opened 5 years ago
Hello! I came with a hacky but yet simple way to fix this.
if ignore_colspan:
max_columns = max(map(len, table_rows))
table_rows_temp = []
for row in table_rows:
table_rows_temp.append(row)
table_rows = table_rows_temp
meta = {"imported_from": "html", "source": source}
return create_table(table_rows, meta=meta, *args, **kwargs)
Output:
Rows:
['f1', 'f2', 'f3']
['row0 f1', 'row0 f2', 'row0 f3']
['row1 f1', 'row1 f2-3']
Row(f1='row0 f1', f2='row0 f2', f3='row0 f3')
Row(f1='row1 f1', f2='row1 f2-3', f3=None)
I appended rows to a separate list (table_rows_temp
) and then re-assigned it to table_rows
. Apparently due to some post processing done by rows
, the missing element is automatically to None
. I'm not sure which code block is responsible for assigning None
to the missing element but it does work.
I have tested it with other possible cases, and it works there as well.
If
ignore_colspan=True
(default), all lines having a size smaller than the max row size for that table will be ignored. This was created to have the same number of fields but can lead to data loss. The ideal would be get to interpret this information and fill some cells with blanks.The test HTML can be this one:
And the code:
The current implementation prints:
The ideal implementation would print: