spring-cloud / stream-applications

Functions and Spring Cloud Stream Applications for data driven microservices
https://spring.io/projects/spring-cloud-stream-applications
252 stars 104 forks source link

s3-sink java.io.File casting #451

Open borg1310 opened 1 year ago

borg1310 commented 1 year ago

hi, we want to use the s3-sink application to write files to an s3 storage. For this we use the File-Source and the S3-Sink applications. in the file-source the mode is set to "ref" (a java.io.File should be returned). when writing to the s3 sink, error [0] occurs. when debugging, we noticed that not a file arrives in the S3MessageHandler (method upload in line 306), but a byte array containing the path to the file. imho, the problem is that the path is not converted to a java.io.File object. Am I doing something wrong or is there an additional setting for this (especially for the keyExpression property) ?

thanks in advance best regards, juergen

[0] Caused by: java.lang.IllegalStateException: Specify a 'keyExpression' for non-java.io.File payloads at org.springframework.integration.aws.outbound.S3MessageHandler.upload(S3MessageHandler.java:390) at org.springframework.integration.aws.outbound.S3MessageHandler.handleRequestMessage(S3MessageHandler.java:277) at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:136) at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:56) ... 39 more

artembilan commented 1 year ago

The java.io.File is not OK abstraction to transfer via network. Even if we really can convert it into a file path and then serialize that string properly when we send to the binder, it does not mean that on a consumer side even if we deserialize that path to the java.io.File, such an object is going to be present on that target file system to pull data for S3.

It is best for you now to transfer byte[] of the file content from that File-Source.

We may think about something like payload-to-file=true|false option for this S3-Sink, if end-user is sure that both apps are operating against the same file system. Why then would one place a binder in between?..

I'm not fully familiar with SCDF, but I believe that there has to be an option to co-locate apps with in-memory interaction.

/CC @corneil , @onobc

onobc commented 1 year ago

but I believe that there has to be an option to co-locate apps with in-memory interaction.

Apps can be "co-located" via Function Composition.

artembilan commented 1 year ago

Thanks, Chris, but doesn't look like that doc shows how to do that. It talks about functions, but we have here apps which are things in itself and, yeah, tied to specific binder according to their packaging. Plus I doubt users are interested in the programming style composition for out-of-the-box apps. More over it is not clear if its possible to compose Source with Sink. I guess we can brainstorm other day.

onobc commented 1 year ago

Good points @artembilan

Yes, the user would have to create a custom stream application that chained the functions together into a single app. That is the only way I know how to do that in SCDF. Using that technique I do think it would be possible to chain the file source to s3 sink (eg. spring.cloud.function.definition=file|s3). But it would require user to create a custom stream app.

onobc commented 1 year ago

Oleg: We could use the application content type extra parameters (~sub-types) to include the extra info about the payload (byte[]) eg. classname, filepath, etc..

We can leverage Spring MimeType to help w/ this.

onobc commented 5 months ago

Moving out to 2024.1.x as we did not have cycles to get to this in the 2024.0.0 timeline.