jakartaee / batch

The Jakarta Batch project produces the Batch Specification and API.
https://projects.eclipse.org/projects/ee4j.batch
Apache License 2.0
13 stars 17 forks source link

typed ItemWriter, ItemReader, ItemProcessor, etc #83

Closed follis closed 4 years ago

follis commented 4 years ago

Originally opened as bug 7295 by struberg

--------------Original Comment History---------------------------- Comment from = struberg on 2015-09-07 08:18:56 +0000

The APIs defined in the spec are currently purely dealing with Object.class. This is so 90s ;)

In Apache BatchEE we additionally introduced subclasses which use Generics for better user experience.

A sample would be public abstract class TypedItemWriter<R, C extends Serializable> implements ItemWriter {

where R is the type of the item to Read and C is the type of the Checkpoint.

You can dig into our impls at https://github.com/apache/incubator-batchee/tree/master/extensions/extras/src/main/java/org/apache/batchee/extras/typed


Comment from = ScottKurz on 2015-09-08 15:31:56 +0000

For 1.0, we'd originally used generics, and then removed them after deciding they weren't adding much value.

Could you give some examples of what use cases and patterns this faciliates?

E.g. one I can think of is defining reader/processor/writer as inner classes of a single outer class. Another would be unit test.

Might be difficult to revisit without breaking the API (e.g. the List on the Writer), but still would like to start by hearing the case in more detail.


Comment from = ScottKurz on 2015-09-08 15:37:47 +0000

Went back to see the earlier discussion seems to have been in Bug 4776, in case it's still interesting ( haven't read through recently).


Comment from = struberg on 2015-09-08 16:12:02 +0000

The main benefit is that it makes the Readers/Processors/Writers much more readable. And you can see at compile time that you got something wrong.

Having to just look at the top of the class and seeing

public class MyReader extends TypedReader<MyCustomerDTO, Long>

immediately tells me what happens: this piece reads MyCustomerDTOs and the checkpoint type is Long.

And if I connect this with a e.g.

public class SomeProcessor extends TypedProcessor

then it's obvious what is wrong, right? It just makes the code SOOO much better to read!


Comment from = ScottKurz on 2016-03-16 19:00:36 +0000

(In reply to struberg from comment #3)

The main benefit is that it makes the Readers/Processors/Writers much more readable. And you can see at compile time that you got something wrong.

Having to just look at the top of the class and seeing

public class MyReader extends TypedReader<MyCustomerDTO, Long>

immediately tells me what happens: this piece reads MyCustomerDTOs and the checkpoint type is Long.

And if I connect this with a e.g.

public class SomeProcessor extends TypedProcessor

Still interested in this one.

OK, I can appreciate the readability gains with respect to R the item type, especially since you need to chain these together with the other artifacts, reader/processor/writer, listeners.

One point I was trying to make though is that I'm not seeing how adding the generic types helps you at all in seeing at compile time if you've chained a reader/processor/writer together wrong. OK, it prevents you from returning the wrong type in readItem() but are you saying it prevents you from chaining reader/processor together with the wrong types? If so, how?

Also I'm not seeing as much readability gain for C, the checkpoint type. If the type is internal to the one class, is it really that helpful? Maybe if you get into creating reader/writer subclasses it becomes more useful but haven't thought it through.

So as far as all the write-related classes and methods, it seems like we should have used just "List" instead of "List". Not seeing how we could get to "List" without breaking 1.0 writers, etc. I wonder if we could have a TypedItemWriterXXXX set of classes extending ItemWriter ? Haven't prototyped this out to see what the issues would be.


Comment from = struberg on 2016-03-16 20:13:09 +0000

but are you saying it prevents you from chaining reader/processor together with the wrong types?

Nope, because there is XML inbetween. BUT tools could add support for that really easily if we introduce a typesafe interface!

Also I'm not seeing as much readability gain for C, the checkpoint type. That's also fairly easy. You don't need any upcast. You don't need any checks in the code. That can all be done in the impl. After all you need to store the checkpoint in the db, etc with the type.

But it's mainly about readability. I'm not quite sure if it was really designed for it, but the practical effect of the batch-jobs/*.xml is a really subtle one.

It's a bit like maven pom.xml files. People now don't need to dig into a tons of source files for hourse to know where to get started. They just go to the batch-jobs XML files and voila. The next step is going to the reader, writer etc. And there is makes a huge difference if the ItemReader returns Object and you need to dig for another hour to know what it does. Or if you see with a single glimpse that it returns a MedicalApplicationDTO and the Checkpoint is a LocalDate.


Comment from = ScottKurz on 2016-03-21 17:51:12 +0000

Thinking about this more, to add a generic type for both item type and checkpoint data, the list would include:

a) core artifacts: ItemReader, ItemProcessor, ItemWriter

b) listeners: ItemReadListener, ItemProcessListener, ItemWriteListener SkipProcessListener, SkipWriteListener RetryProcessListener, RetryWriteListener

(Left off SkipReadListener, RetryReadListener, since we never refer to the item type in those methods. The only point leaving them in would be to add symmetry to the type hierarchy).

Sounds like there's no new proposal for introducing generics to reflect the "user data" types (mentioning since this was in the 1.0 spec, but I'm not proposing anything for user data).


Now, in 1.0 we also provide a few abstract classes with no-op methods:

AbstractItemReader, AbstractItemWriter AbstractItemReadListener, AbstractItemProcessListener, AbstractItemWriteListener

So it seems the most obvious way of moving forward is to create a new interface TypedXXXX for every interface XXX, and a new abstract class TypedAbstractXXXX for every existing abstract class AbstractXXX.

But I suppose there are also options, including adding the new TypedXXX constructs as abstract classes rather than interfaces (which BatchEE does). Possibly use a Java 8 default impl (we've indicated we'll probably require Java 8 but haven't firmly committed just yet).

Thoughts?


Comment from = struberg on 2016-03-21 17:54:59 +0000

I personally think it would be sensitive to use Java8 and might look if we can get away with default methods?


Comment from = ScottKurz on 2016-03-21 18:07:38 +0000

(In reply to struberg from comment #7)

I personally think it would be sensitive to use Java8 and might look if we can get away with default methods?

Default methods are added in Java 8. I'm not clear if you are saying it is a good idea or bad idea to assume/require Java 8 for Batch 1.1.


Comment from = struberg on 2016-03-21 19:07:54 +0000

+1 for Java8 as we are targeting Java EE8. Java7 will also be EOL soon.


Comment from = BrentDouglas on 2016-03-25 16:43:42 +0000

I've read this thread a few times now and I still can't work out what benefits we get from this. Due to the XML between method calls the types we see at the moment are accurate as they are the Runtime type observed by the impl. As far a readability goes, I think the checkpoint example will provide no benefits. It is prefectly valid to return a narrower type than declared on the interface so in the example we can go ahead and return a LocalDate with the spec as it is now. The only thing I can see us getting out of adding types is being able to narrow the types if the params however this basically means casting Object to T in the runtime as were still observing the runtime type as Object. The way I see it, this means either adding extensive type checking to batch artifacts, which i believe to be fairly infesible, or throwing a CCE in the runtime rather than in consumer code.

Imo these classes are not actually type safe, they should not be declared as if they are.

Am I missing something?


Comment from = rmannibucau on 2016-03-25 17:19:14 +0000

@Brent: technically you are right: by design it is not typed but as a user having to cast by design is not satisfying today. Current API enforces casting in a lot of cases where you just want to implement smoothly in->out types. This issue means if you don't care (as today) you could return Object but if you care you can enforce it.

It is also insanely important for documentation since Object process(Object) doesn't document anything but Foo process(Bar) does. It also helps a lot when generating data and just maintaining the code.

This is really about make the API smooth for end users more than bringing anything technically - see how many users complain or just other abstractions cause the spec one is not matching a smooth and modern coding style.


Comment from = ScottKurz on 2016-03-25 19:17:33 +0000

I've come around to thinking we made the wrong decision in 1.0, and it's just a matter of where to go from here.

The discussion I recall was limited to having to cast (or not) and the fact that generics weren't really providing any compile-time checking. True, but the potential for introspection, gains in readibility, and ease in reverting to Object or raw types if you didn't care for this should have led us to use generics more.

One can always use Object as the type argument or just use raw types if you don't care to specify a specific type. (Or at least one could have done that if we didn't already have a List for ItemWriter.writeItems() and in another listener method or two, which leaves us a complication).

That said, I don't rank this quite as high a priority as Mark and Romain, so the more you guys can help with the details of a complete proposal which addresses backwards compatibility, the sooner we can resolve this for 1.1.

scottkurz commented 4 years ago

Closing this as a duplicate of : https://github.com/eclipse-ee4j/batch-api/issues/13 which holds the more recent discussion.