ozlerhakan / poiji

:candy: A library converting XLS and XLSX files to a list of Java objects based on Apache POI
MIT License
467 stars 134 forks source link

Load Data from Excel Table by Name #326

Open frederikb opened 1 week ago

frederikb commented 1 week ago

Hi,

first of all thanks for sharing this library. I've previously used POI directly to load data from XLSX files and appreciate any approach to abstract away the common pain points!

My question (or probably feature suggest): is it possible to load data from a table declared in a XLSX file by name from a sheet?

As in XSSFTables which are accessed via those returned from XSSFSheet#getTables and which provides such methods such Table#getStartRowIndex, Table#getHeadersRowCount and Table#findColumnIndex

The idea being that the code to load data from the sheet would be a bit more robust against the table start row/column changing in the document.

Is this already possible or something you've considered as potentially in scope?

Thanks!

github-actions[bot] commented 1 week ago

Thank you for contributing to Poiji! Feel free to create a PR If you want to contribute directly :)

ozlerhakan commented 1 week ago

Hi @frederikb ,

Thank you for finding Poiji useful! Regarding the question, we do not have support XSSFTable in Poiji. If you have some samples, we could find a way to extend the tool with this feature.

Best!

frederikb commented 6 days ago

Hi, of course!

I've modified the cars.xlsx from your repo by shifting the table slightly down and to the right and turning it into a table (select cells and press Ctrl+ / Cmd+T.

cars.xlsx

In case you have JBang installed you can run this script:

///usr/bin/env jbang "$0" "$@" ; exit $?
//DEPS org.apache.poi:poi-ooxml:5.3.0

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;

import java.io.FileInputStream;
import java.io.InputStream;
import java.io.IOException;

public class cars {

    public static void main(String[] args) throws IOException {
        String filename = "cars.xlsx";
        try (InputStream file = new FileInputStream(filename);
             XSSFWorkbook workbook = new XSSFWorkbook(file)) {

            XSSFTable table = workbook.getTable("Cars");

            XSSFSheet sheet = table.getXSSFSheet();

            int startRow = table.getStartRowIndex();
            int startColumn = table.getStartColIndex();
            int endRow = table.getEndRowIndex();
            int headerRowCount = table.getHeaderRowCount();
            int dataStartRow = startRow + headerRowCount;

            int nameColIndex = table.findColumnIndex("NAME");
            int yearColIndex = table.findColumnIndex("YEAR");
            int seatsColIndex = table.findColumnIndex("Seats"); // Column names are case-insensitive

            // Loop through the data rows
            DataFormatter formatter = new DataFormatter();
            for (int rowNum = dataStartRow; rowNum <= endRow; rowNum++) {
                Row row = sheet.getRow(rowNum);

                String nameValue = formatter.formatCellValue(row.getCell(startColumn + nameColIndex));
                String yearValue = formatter.formatCellValue(row.getCell(startColumn + yearColIndex));
                String seatsValue = formatter.formatCellValue(row.getCell(startColumn + seatsColIndex));

                System.out.println(nameValue + "\t" + yearValue + "\t" + seatsValue);
            }

        }
    }
}

which will print this output:

Honda Civic 2017    4
Chevrolet Corvette  2016    2

From an API perspective something like the following could be perhaps work:

@ExcelTable("Cars") // no need to even specify the sheet
public class Car {

    @ExcelCellName("NAME")
    private String name;

    @ExcelCellName("YEAR")
    private String id;

    @ExcelCellName("Seats") // case-insensitive by default if using the API from `Table`seen above
    private String phone;
}

More advanced features such as @ExcelCellName(value = "", expression = "Surname|Second name")would still require to perform your own look up such as table.getColumns().stream().filter(c -> c.getName().matches(columnNameExpression)).toList(), but you get the idea.

Looking (very briefly) at the code base you seem to be using a different API / approach for processing the workbooks, so I'm not sure how much of a help this is to you.