spring-projects / spring-batch-extensions

Spring Batch Extensions
242 stars 258 forks source link

add skip column feature in excel file readers #64

Closed parikshitdutta closed 1 year ago

parikshitdutta commented 3 years ago

Closes BATCH-EXT-40 [#40]

mdeinum commented 3 years ago

First of all thanks for the contribution. However, as the author of this library, I would suggest to either

  1. Implement this with the RowMapper ignore the column while mapping to the object
  2. Implement this with a RowSet and RowSetFactory and ignore it there.

The readers for Excel are, deliberately, inspired by the the FlatFileItemReader (amongst others) and only allow for skipping rows and not columns. That is better left to either of the aforementioned solutions.

parikshitdutta commented 3 years ago

Thanks @mdeinum for your feedback.

As per my understanding and as you also mentioned in your second suggested solution, that is the best way to implement the functionality is using RowSet and RowSetFactory

If you refer to my submitted PR, the logic for skipping column is defined in Rowset and DefaultRowSet respectively, however I have invoked the same from AbstractExcelItemReader.doRead() to keep the skipping column option available at the readers level.

Do you mean, I rather should abstract it from readers, and keep the implementation details at RowSet level?

In that scenario, I should use RowSet.next() to implement the same.

Your thought @mdeinum ?

mdeinum commented 3 years ago

The logic is a bit scattered, the ItemReader now suddenly needs to know it needs to skip columns. That should be a sole responsibility of the RowSet and should be transparent for the ItemReader and other components using a RowSet. Another thing that bothers me is why just a single column? What if you have 10 columns and want to skip multiple columns (or looking at it from a different side, read only a part of that column).

I'd rather see something like available in the DelimitedLineTokenizer (the RowSet (initial inspiration came from that and the JDBC support in Spring). That will read all columns by default or only the list of included columns. For the remainder of the infrastructure, it is completely transparent that this is happening.

The columns to include could be part of the RowSetMetaData so that it can apply the filtering to the headers as well and it could be obtained/used as well when reading the current row (and filter out the given columns).

mdeinum commented 1 year ago

Closing due to inactivity. If we want to revisit we can re-open or create a new issue for this.