jakartaee / batch

The Jakarta Batch project produces the Batch Specification and API.
https://projects.eclipse.org/projects/ee4j.batch
Apache License 2.0
13 stars 17 forks source link

Add a progress API #98

Open follis opened 4 years ago

follis commented 4 years ago

Originally opened as bug 7656 by struberg

--------------Original Comment History---------------------------- Comment from = struberg on 2016-05-02 10:12:07 +0000

Got some interesting feature request from our ops team.

They like to get some feedback about the 'progress' of a current batch run. E.g. "32500 of 248494 items processed"

While I can give them the current Metrics we still don't know what the total sum will be. Of course it's not always possible to provide the whole sum upfront, but sometimes it is.


Comment from = ScottKurz on 2016-05-02 19:54:14 +0000

So would you net this out into a proposal for a new metric: expectedCount?

We'd also have to invent an API for the app notifying the container what the value is (and constraining with any rules we'd come up with).

Kind of an odd fit though with all the other metrics set by the container.

Maybe a better fit would be to extend metrics with a set of app-provided key/value pairs? Part of the rationale would be that it's a place to stuff things other than the persistent user data so you don't need to be able to deserialize the user data class to read it.

Probably should consider the behavior under rollback...


Comment from = cf126330 on 2016-05-03 02:30:31 +0000

Some data sources do not support the notion of total number of items. For instance, an item reader pulling messages from a destination, where there may always be new messages coming in. In other data sources, it's possible to get a total (for example, a CSV file), but it means the batch runtime will do extra loading-sensing first, which doesn't fit well with the current read-till-end approach.

Another way is to address it at application level. The app can get the total number of items in other means, even with a batchlet.


Comment from = struberg on 2016-05-03 06:21:24 +0000

Some data sources do not support the notion of total number of items. Yes of course. But in some cases it's very doable.

I hope the goal is clear now, and while it is not technically possible to know the sum count in all cases (e.g. streaming), it pretty often can be accomplished and is an important information for for Ops. They just need to have an idea if they should let a batch run for 5 further minutes (because it is almost done), or if they should better suspend it because it's not close to be finished.

I didn't liked to add my initial idea to the problem description because I'm a big fan of separating the 'problem description' from a 'possible solution'. Most of the time this gets mixed up by analysts and you end up not knowing what the real goal is (and imo that separates the good from the bad ones), but anyway - different topic ;)

My first idea was to add an additional interface like which could be implemented by the either a Batchlet or a ItemReader. Something like

interface FiniteProcessing { long getSummaryCount(); }

Of course we need to define what number we return for a suspended or restarted step. The whole sum or just the sum at the step (re)start?

Of course only a rough idea up for discussion...