spring-projects / spring-batch

Spring Batch is a framework for writing batch applications using Java and Spring
http://projects.spring.io/spring-batch/
Apache License 2.0
2.73k stars 2.35k forks source link

Refactor RedisItemWriter and RedisItemReader logic #4652

Open suzzingV opened 3 months ago

suzzingV commented 3 months ago

Hello,

I believe there might be a better way to implement the RedisItemReader and RedisItemWriter. Currently, the RedisItemReader stores the Redis data's value in the chunk. The RedisItemWriter then uses an ItemKeyMapper to derive the key from this value and executes the writeKeyValue function with the key and value.

However, I find it quite cumbersome to implement an ItemKeyMapper that derives the key from the value. In a key-value structure, retrieving the key based on the value seems to underutilize the key-value paradigm.

When I was setting up a batch job to delete specific Redis data, I found it inefficient to implement an ItemKeyMapper that derives the key from the value, so I customized the code accordingly. Instead, I implemented the ItemKeyMapper to return the value when given a key.

Therefore, I believe it would be more efficient for the RedisItemReader to store the Redis data's keys in the chunk. If my suggestion is not accepted, I would be very interested in understanding the reasoning behind that decision.

Thank you for considering my suggestion.

fmbenhassine commented 2 months ago

That's a valid concern. I was hit by this limitation recently when I was trying to export a set of key/value pairs from redis to a csv file (no RedisItemWriter use). Since the reader was retruning only values, it was impossible to export the associated keys (without an ugly workaround of retrieving them in order from from a cursor with the same scan options..).

Therefore, I believe it would be more efficient for the RedisItemReader to store the Redis data's keys in the chunk.

I agree. I believe a good return type would be an encapsulation of the key with its associated value, something like Pair<K,V>. Here is the modified version I ended up working with:

RedisItemReader.java ```java /* * Copyright 2023 the original author or authors. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * https://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.springframework.batch.item.redis; import org.springframework.batch.item.ExecutionContext; import org.springframework.batch.item.ItemStreamException; import org.springframework.batch.item.ItemStreamReader; import org.springframework.data.redis.core.Cursor; import org.springframework.data.redis.core.RedisTemplate; import org.springframework.data.redis.core.ScanOptions; import org.springframework.util.Assert; /** * Item reader for Redis based on Spring Data Redis. Uses a {@link RedisTemplate} to query * data. The user should provide a {@link ScanOptions} to specify the set of keys to * query. * *

* The implementation is not thread-safe and not restartable. *

* * @author Mahmoud Ben Hassine * @since 5.1 * @param type of keys * @param type of values */ public class RedisItemReader implements ItemStreamReader> { protected final RedisTemplate redisTemplate; protected final ScanOptions scanOptions; protected Cursor cursor; public RedisItemReader(RedisTemplate redisTemplate, ScanOptions scanOptions) { Assert.notNull(redisTemplate, "redisTemplate must not be null"); Assert.notNull(scanOptions, "scanOptions must no be null"); this.redisTemplate = redisTemplate; this.scanOptions = scanOptions; } @Override public void open(ExecutionContext executionContext) throws ItemStreamException { this.cursor = this.redisTemplate.scan(this.scanOptions); } @Override public Pair read() throws Exception { if (this.cursor.hasNext()) { K nextKey = this.cursor.next(); return new Pair<>(nextKey, this.redisTemplate.opsForValue().get(nextKey)); } else { return null; } } @Override public void close() throws ItemStreamException { this.cursor.close(); } public record Pair(K key, V value) { } } ```

This design is more appropriate for the key-value paradigm and also works nicely for use cases that do not involve the RedisItemWriter. If the redis writer is used, then the itemKeyMapper could be implemented to either return the original key for the given value (available in the pair), or derive a custom key if needed. What do you think?

suzzingV commented 2 months ago

Here's a more polite and formal version of your message in English:

Thank you very much for your response and for sharing your thoughts on this matter. It is an honor to know that we share a similar perspective on the issue.

I personally believe that returning only the keys, rather than both key-value pairs, might be a more efficient approach. Given Redis's characteristic ease in retrieving values by their keys, I think returning just the keys would be sufficient. In cases where only the keys are needed, they can be used directly, and when values are required, they can be fetched as necessary. This approach would provide more flexibility while also minimizing unnecessary data transfer.

Additionally, when using the writer, the ItemKeyMapper could handle such transformations, thereby separating responsibilities more effectively. By managing only the required data based on the context, I believe it would be more efficient to return just the keys during the reading process. I would appreciate hearing your thoughts on this approach.