spring-projects / spring-batch

Spring Batch is a framework for writing batch applications using Java and Spring
http://projects.spring.io/spring-batch/
Apache License 2.0
2.69k stars 2.33k forks source link

Provide capability to transform a bean into fixed-length record [BATCH-379] #3201

Closed spring-projects-issues closed 16 years ago

spring-projects-issues commented 16 years ago

Kyrill Alyoshin opened BATCH-379 and commented

I am creating this issue as per our discussion on the forum: http://forum.springframework.org/showthread.php?t=50182

I will create a patch with the proposed implementation and attach it to this issue once it is done.


Affects: 1.0.0.m4

Attachments:

spring-projects-issues commented 16 years ago

Dave Syer commented

N.B. BATCH-371 has some very similar points. My guess is that once that is resolved most of the points in the forum will be covered adequately for 1.0.

spring-projects-issues commented 16 years ago

Kyrill Alyoshin commented

I've created a patch that implements the functionality discussed on the forums. Test coverage is 99%. I'd be delighted to see it incorporated. (I will also submit it to BATCH-371).

spring-projects-issues commented 16 years ago

Kyrill Alyoshin commented

I've noticed that 371 is closed, so I will not create a duplicate submission there. :-)

Evaluating 371...

spring-projects-issues commented 16 years ago

Kyrill Alyoshin commented

Hmm... 371...

I see where you're going. I am afraid that FieldSet + LineAggregator combination does not provide the flexibility we need to generate real mainframe files. The key point is that alignment, padding (and maybe even text conversion strategies) need to be specified on a per element, or field, basis. The best I could do without having to break the existing public APIs is to tweak the Range class to provide this abstraction.

Unmapping the object into a FieldSet is an interesting idea, the problem is that each Field in a FieldSet must contain meta-data that describes how it is to be persisted in the flat file. And that is currently missing.

FieldSet, as an abstraction, borrows heavily from JDBC's ResultSet. ResultSet has been designed for 'read' operations. We do not use ResultSets to write into the DB. My gut feeling is that FieldSet is not the right abstraction for 'write' operations.

Please take a look at the patch to see what I mean in detail. Once again, my approach where the Range class acts as field descriptor solves this problem.

spring-projects-issues commented 16 years ago

Dave Syer commented

I hear what you are saying about meta data. (But note that nothing you say has anything specifically to do with mainframes, or COBOL.) I'm not sure it is necessary in the FieldSet API - FieldSet might be inspired by ResultSet, but that doesn't mean we have to copy it all.

If all you need is to format with different alignments in each field, I'm sure there is an easier way. And I agree that FieldSet is not an output-oriented construct. But LineAggregator is, so I would prefer to do it there, along the lines of your suggestion with Range.

spring-projects-issues commented 16 years ago

Lucas Ward commented

Kyrill,

It looks like your patch was created against m4, because of changes made in M5-SNAPSHOT, it won't compile without adapting it. Do you mind creating the patch against trunk?

spring-projects-issues commented 16 years ago

Kyrill Alyoshin commented

Lucas,

Nope. Against the trunk. Just updated and rebuilt everything. Everything works. Even unit tests pass. What compiler errors do you get? You can send email with errors to me directly -kyrill007 at gmail-.

Kyrill

spring-projects-issues commented 16 years ago

Kyrill Alyoshin commented

Dave,

But LineAggregator has been changed to operate on a FieldSet: #aggregate(FieldSet). Presumably we'd have to write a whole new FieldSet implementation, which would allow for individual field configurations. Today this is just names and values. But what we need is a FieldSet that has a SortedSet of some elements (Ranges?) that allow for the following property configurations:

  1. Mandatory: starting position of the field, property binding, alignment, padding.
  2. People will ask in the future: propertyEditor field to provide per field text conversion (rather than the default per class type conversion).

It'd be interesting to know how this can be done with FieldSet... I'd suggest that refactor the #aggregate(FieldSet) method inot

aggregate(Object). If some one want to do a FieldSet, it's fine, if someone want to aggregate a Java bean this is fine too. Then my BeanWrapperItemTransformer become BeanWrapperLineAggregator that a Transformer can delegate to.

What do you think?

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky commented

http://jira.springframework.org/browse/BATCH-278

Any questions? :)

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky commented

Addressing your comments directly -- a fieldset is EXACTLY a string array plus some extra functionality (e.g. the ability to name each column). There is no reason you wouldn't want to use a fieldset that I can think of. Your FieldSetUnmapper (as it's called today - I asked Dave to change the name but we'll see what happens) can do any sorting/etc (i.e. change your domain-specific object into a string array+) and your aggregator can handle formatting (i.e. transform each string from the string array+ into your desired output). Please explain exactly why you think the aggregator that solves the above-mentioned issue (BATCH-278) can't fulfill your use case.

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky commented

Further expanding on your question about needing to create a custom FieldSet, I'd also like to mention that I am totally opposed to FieldSet being an interface as opposed to a concrete implementation - the FieldSetMapper/Unmapper and LineTokenizer/Aggregator COMPLETELY OBVIATES the need to have different FieldSet implementations. There is no reason to use anything other than the DefaultFieldSet.

(The argument that FIeldSet parallels ResultSet is erroneous --- they represent very different concepts.)

spring-projects-issues commented 16 years ago

Kyrill Alyoshin commented

Doug, what do you mean 'erroneous'?

This is straight from reference documentation:

"A FieldSet is Spring Batch's abstraction for typing fields from a flat file data source. It allows developers to work with file input in much the same way as they would work with database input. A FieldSet is conceptually very similar to a Jdbc Result Set."

What I am trying to find is where to store per field configuration properties?

They are once again: field position in a record, its property binding (can be literal), alignment, padding, and optional property editor.

You realize that each field in a FieldSet may need to be formatted individually? So, where are those "fields"? Where is the object that stores this config data?

What you're suggesting is that I preprocess my bean into a String array, then stick this String array into a FieldSet and then I aggregate it into line?

spring-projects-issues commented 16 years ago

Kyrill Alyoshin commented

I suppose if we added a Field class (or renamed Range into Field) and made a FieldSet contain a sorted set of fields, this would be a different story. The it would really be a FIELD set, where a field contains property that describe it. We could then utilize it in a nice way for bi-directional navigation (as in BeanWrapperFieldSetMapper and BeanWrapperLineAggregator).

Hmm... the more I think about it, the more I like it. Of course, it may break existing public APIs. But it does seem elegant.

spring-projects-issues commented 16 years ago

Dave Syer commented

What you're suggesting is that I preprocess my bean into a String array, then stick this String array into a FieldSet and then I aggregate it into line?

What's wrong with that? Actually, it seems to me to show only that FieldSet is already over-engineered for output - String[] would work just as well. I don't want to introduce extra meta data unless it is really necessary.

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky commented

By erroneous, I mean that although that was the original thought on the issue, it has since become increasingly clear to me that the similarities end at the naming. It is not a productive argument to argue that field set should contain some functionality or form for the sake of symmetry with result set.

Kyrill, you're trying to over-engineer things here. Your needs can be satisfied if you create a FieldSetUnmapper to map your object into a field set, then write a LineAggregator that contains a map from either field name (from the field set metadata) to format or from index number to format. That's the crux of BATCH-278 and the reason it has not yet been resolved is because I was waiting for Dave to fix BATCH-371. The solution to BATCH-278 is a LineAggregator that does exactly what you want - lets you specify width, alignment and padding for each field in a field set.

Dave, I think keeping field set is important at the very least for the naming of the fields - although I'm not thrilled with the idea, you could create an implementation that contains only the string array and naming aspects and use that for writing by default, while refactoring the current DefaultFieldSet to extend this new class and be used for reading.

spring-projects-issues commented 16 years ago

Dave Syer commented

Doug: I was finished with BATCH-371. Did you want more there? (FieldSetCreator is the new name for the unmapper.)

Agree with everything else in your last comment, but I don't think I'll bother splitting the read aspects of FieldSet into a separate interface.

spring-projects-issues commented 16 years ago

Kyrill Alyoshin commented

OK. Let's attempt to really think it through.

  1. Obviously, everything begins with a Tasklet. For each execution, the tasklet will create a bean populated with relevant data (to be transformed in a line later on).

  2. The tasklet will use FlatFileItemWriter to write each individual line to a file.

  3. The FlatFileItemWriter will delegate to the ItemTransformer to do the actual transformation of a bean into a line.

This is where troubles begin. The patch includes a BeanWrapperItemTransformer that transforms the bean into such a line and this is the end of story. But you say this is not the ideal solution. OK.

Now, I guess we have to figure out how to make FieldSetCreator + LineAggregator + some ItemTransform play all nice together. Speaking of the complexity...

  1. I guess the first thing is to figure out what my BeanWrapperItemTransformer needs to refactored into? Presumably into BeanWrapperFieldSetCreator.

  2. Where is the actual transformation of bean properties into padded String[] occurs? BeanWrapperFieldSetCreator? The point is it has to occur where the line configuration/metadata is stored (i.e where Range object are, since they describe the per field formatting needs).

  3. If so, how does BeanWrapperFieldSetCreator get hooked into the transformation cycle (again the assumption is that we're going to use FlatFileItemWriter that delegates to a Transformer).

I'd love you, guys, to take a closer look at it. All of my needs are expressed in the unit tests in the patch.

spring-projects-issues commented 16 years ago

Dave Syer commented

OK, here are the collaborations:

  1. Forget about Tasklet, and ItemTransformer

  2. FlatFileItemWriter delegates to a FieldSetCreator and then a LineAggregator

Observations:

spring-projects-issues commented 16 years ago

Kyrill Alyoshin commented

Dave,

First, thanks for stepping up and clarifying things. I promise I will buy you a beer at Java ONE. :-)

Second, yep, it was I who logged that issue for Spring Binding. As you can see, I like bindings. :-)

Third, I suppose if BeanWrapperFieldSetCreator takes a sorted range as its state, the problem is solved. Existing BeanWrapperItemTransformer can be refactored into BeanWrapperFieldSetCreator.

The only thing that I think we have not nailed is where field conversion (i.e. padding, alignment) is performed. I mean if BWFSC takes a set of ranges, it probably will have to apply the "padding and co" to form the resulting String[] -> FieldSet. Then the line aggregator is almost not needed (i.e. StringUtils.arrayToDelimitedString(String[], ""). However if LineAggregator is the one to know about Ranges (i.e. per field formatting) then the logical flow is not clear.

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky commented

Posted a patch for http://jira.springframework.org/browse/BATCH-278 - this provides some of the functionality you discussed. The other piece is the FieldSetCreator interface which you can use to map any bean to a field set in a custom manner. Hope this helps.

spring-projects-issues commented 16 years ago

Kyrill Alyoshin commented

I think Dave needs to step in and sort this whole thing out.

Dave, any plans to come back to this issue?

spring-projects-issues commented 16 years ago

Mike Grudkowski commented

Hi,

My team is considering including this patch for our project, but are hesitant to go ahead before resolution has been made regarding approach. It appears there is an alternative approach taken in the patch to BATCH-278. To the point, has any resolution been made?

Mike

spring-projects-issues commented 16 years ago

Dave Syer commented

The patch attached to this issue is sort of obsolete. The recommended approach would be to implement a FieldSetCreator. Which features of the patch were you interested in?

spring-projects-issues commented 16 years ago

Mike Grudkowski commented

Thanks for the quick reply.

Well, we like Range enhancements and the RangePropertyEditor which lets us specify how each domain property we are interested in writing should be padded and aligned. We are currently taking the recommended approach, delegating to a tokenizer inside the FieldSetCreator implementation to create an array of tokens from the domain object. The tokenizer contains all the padding and alignment logic and is essentially hidden, i.e. not externalized.

So we are interested in the externalization features. Why is the approach considered obsolete?

spring-projects-issues commented 16 years ago

Dave Syer commented

The obsolete thing is the use of an ItemTransformer where FiledSetCreator is more appropriate (it didn't exist when this issue was created). To track the externalisation features go to BATCH-278. I think we might close this one, unless anyone disagrees.

spring-projects-issues commented 16 years ago

Mike Grudkowski commented

Quick question Dave...Is the plan for BATCH-278 to include support for something similar to the logback formatting scheme?

I like the idea of specifying both padding, alignment, and replacement chars all in one conversion string instead of in several maps. That said, both approaches would help us out over what currently exists.

spring-projects-issues commented 16 years ago

Dave Syer commented

Let's keep the discussion of BATCH-278 in the proper place. I'll close this one for now.