Open blcham opened 10 months ago
@blcham Please have a look at the current implementation of the #228 . There are also two test fails with merged XLS and XLSX files. The reason is that current implementation also produces statements for the merged cells, like here:
<http://test-file#row-2>
<http://onto.fel.cvut.cz/data/aa>
"merged columns" ;
<http://onto.fel.cvut.cz/data/bb>
"" ;
<http://onto.fel.cvut.cz/data/cc>
"" .
while the expected output is:
<http://test-file#row-2>
<http://onto.fel.cvut.cz/data/aa>
"merged columns" .
This can be fixed with a control if(cell.getCellType() == CellType.BLANK)continue;
But, it will ignore not only merged cells, but also cells that are empty, but filled with color, or contains table border for example. If it's not acceptable, then I believe it will require to store the information about every cell if it's merged or not, which will require more memory. Please share your opinion about this.
Adding missing context:
Failing tests are in methods:
[Xls | Xlsx]
()The input file:
@blcham Please have a look at the current implementation of the #228 . There are also two test fails with merged XLS and XLSX files. The reason is that current implementation also produces statements for the merged cells, like here:
<http://test-file#row-2> <http://onto.fel.cvut.cz/data/aa> "merged columns" ; <http://onto.fel.cvut.cz/data/bb> "" ; <http://onto.fel.cvut.cz/data/cc> "" .
while the expected output is:
<http://test-file#row-2> <http://onto.fel.cvut.cz/data/aa> "merged columns" .
This can be fixed with a control
if(cell.getCellType() == CellType.BLANK)continue;
But, it will ignore not only merged cells, but also cells that are empty, but filled with color, or contains table border for example. If it's not acceptable, then I believe it will require to store the information about every cell if it's merged or not, which will require more memory. Please share your opinion about this.
@rodionnv :
1) I believe the expected output now is correct
2) i suggest removing bb
from the example so we have also an example of an empty cell ... could you run it on main
branch to find out how it is serialized to csv?
@blcham Please have a look at the current implementation of the #228 . There are also two test fails with merged XLS and XLSX files. The reason is that current implementation also produces statements for the merged cells, like here:
<http://test-file#row-2> <http://onto.fel.cvut.cz/data/aa> "merged columns" ; <http://onto.fel.cvut.cz/data/bb> "" ; <http://onto.fel.cvut.cz/data/cc> "" .
while the expected output is:
<http://test-file#row-2> <http://onto.fel.cvut.cz/data/aa> "merged columns" .
This can be fixed with a control
if(cell.getCellType() == CellType.BLANK)continue;
But, it will ignore not only merged cells, but also cells that are empty, but filled with color, or contains table border for example. If it's not acceptable, then I believe it will require to store the information about every cell if it's merged or not, which will require more memory. Please share your opinion about this.@rodionnv :
- I believe the expected output now is correct
- i suggest removing
bb
from the example so we have also an example of an empty cell ... could you run it onmain
branch to find out how it is serialized to csv?
@blcham In main it seems that it's impossible to have empty cells. I have changed input.xls so it looks like this And got this error:
Now, in the current implementation in this PR, empty cells are ignored in the same way both in excel and html files, like here: Output:
<http://test-file#row-3>
<http://onto.fel.cvut.cz/data/aa>
"merged rows" ;
<http://onto.fel.cvut.cz/data/cc>
"ee" .
Now, in the current implementation in this PR, empty cells are ignored in the same way both in excel and html files, like here:
Yes, it makes sense like that, and for the first non-header row, we should generate:
<http://test-file#row-2>
<http://onto.fel.cvut.cz/data/aa>
"merged columns" .
There is only way to continue on this. Make small PRs that can be merged immediately without breaking existing implementation. It should be easy to review (at most 15 mins for me).
Implements #228