Closed CEDDM closed 2 years ago
The S3MessageHandler
has never supported String
for a payload.
I think the logic in between has been tightened on Spring Cloud Stream level in regards to the contentType
value.
And since you produce a String
from the process it is converted back to string on the sink side.
It probably was not done before and the data from wire has left as a byte[]
for that sink.
But since S3MessageHandler
does not support String
, it is wrong to assume some automatic conversion to byte[]
back.
It is better to write the result of your process to the byte[]
, not string.
The problem with String
on the S3MessageHandler
that it won't be clear if you provide a file name or a content of the file.
So, we left string type out of that S3MessageHandler
logic.
Makes sense?
Thanks @artembilan for your explanation
Since the filename is given by key-expression
, we could assume that the payload is always the content of the file, no matter what the type is.
But in my opinion, the main problem is the code readability and maintenance for company business logic :
Function<InDTO, OutDTO> process()
is way more understandable than something like Function<InDTO, InputStream> process()
or Function<InDTO, byte[]> process()
Unless there is a simple way of making a "double serialization" POJO -> JSON String -> byte[] ?
Yep! Fully agree: the code like Function<InDTO, OutDTO> process()
is more in business language, than anything what makes programming powerful.
In this case it is probably a bit misleading to say that your OutDTO
is serialized to JSON String. Well, it is a JSON, but definitely not a string.
So, it probably something on the input side of the sink which makes that conversion from byte[]
to string just because it sees that as a JSON and doesn't see any expected types to be converted to.
Need some investigation and discussion with the team...
Consider this as a workaround on the s3-sink
: spring.cloud.stream.bindings.input.consumer.useNativeDecoding = true
.
This way no conversion from the byte[]
to String
is going to happen.
Thank you again for your explanations. And sorry if I mixed things up. Many things happen between the apps ans it's not always easy to understand where the problem is. I will try the workaround tomorrow
I've raised issue in Spring Cloud Stream where such a conversion happens from byte[]
to String
: https://github.com/spring-cloud/spring-cloud-stream/issues/2246
I ran a test with the workaround but unfortunately I still have the same exception
Maybe your binding name is not an input
.
Docs are here: https://docs.spring.io/spring-cloud-stream/docs/3.1.5/reference/html/spring-cloud-stream.html#_consumer_properties
(for example,
spring.cloud.stream.bindings.input.consumer.concurrency=3
) And thenuseNativeDecoding
:useNativeDecoding
When set to true, the inbound message is deserialized directly by the client library, which must be configured correspondingly (for example, setting an appropriate Kafka producer value deserializer). When this configuration is being used, the inbound message unmarshalling is not based on the contentType of the binding. When native decoding is used, it is the responsibility of the producer to use an appropriate encoder (for example, the Kafka producer value serializer) to serialize the outbound message. Also, when native encoding and decoding is used, the headerMode=embeddedHeaders property is ignored and headers are not embedded in the message. See the producer property useNativeEncoding.
Default: false.
I tried with the default form to avoid error with the binding name but still same Unsupported payload type: [class java.lang.String]
error
OK. Thanks. Looking into that in debug mode... Why do you do screenshots? Why just not place a plain code snippets, so I would be able to copy-paste them?..
Not correct: that works for me:
spring.cloud.stream.default.consumer.useNativeDecoding = true
and the code is like:
@Bean
public Consumer<Message<?>> consumeBytes() {
return System.out::println;
}
console output:
GenericMessage [payload=byte[267], headers={skip-input-type-conversion=true, aws_shard=shardId-000000000000, aws_checkpointer=ShardCheckpointer{key='anonymous.01b563e1-a891-43a4-85a9-7e57c8ed12d6:test_stream:shardId-000000000000', lastCheckpointValue='49624079089036418142403452771959552389200412755114328066'}, id=6f654fca-c6d5-90b2-8ad7-f4db8172ceef, sourceData={SequenceNumber: 49624079089036418142403452771948672056823881092541972482,ApproximateArrivalTimestamp: Fri Nov 19 11:09:33 EST 2021,Data: java.nio.HeapByteBuffer[pos=0 lim=267 cap=267],PartitionKey: 1,}, aws_receivedPartitionKey=1, aws_receivedStream=test_stream, aws_receivedSequenceNumber=49624079089036418142403452771948672056823881092541972482, timestamp=1637338179204}]
GenericMessage [payload=byte[267], headers={skip-input-type-conversion=true, aws_shard=shardId-000000000000, aws_checkpointer=ShardCheckpointer{key='anonymous.01b563e1-a891-43a4-85a9-7e57c8ed12d6:test_stream:shardId-000000000000', lastCheckpointValue='49624079089036418142403452771959552389200412755114328066'}, id=882082e3-c5ae-21be-dd08-2352726d443a, sourceData={SequenceNumber: 49624079089036418142403452771949880982643495721716678658,ApproximateArrivalTimestamp: Fri Nov 19 11:09:33 EST 2021,Data: java.nio.HeapByteBuffer[pos=0 lim=267 cap=267],PartitionKey: 1,}, aws_receivedPartitionKey=1, aws_receivedStream=test_stream, aws_receivedSequenceNumber=49624079089036418142403452771949880982643495721716678658, timestamp=1637338179209}]
Without useNativeDecoding
it is:
GenericMessage [payload={"@class":"org.springframework.messaging.support.GenericMessage","payload":"Message0","headers":{"@class":"java.util.HashMap","id":["java.util.UUID","c6c98933-7782-bdc8-d21c-4751ef31822c"],"event.eventType":"createEvent","timestamp":["java.lang.Long",1637338302769]}}, headers={aws_shard=shardId-000000000000, aws_checkpointer=ShardCheckpointer{key='anonymous.7ff2f3aa-5972-4e75-bd28-17a62fbcad4b:test_stream:shardId-000000000000', lastCheckpointValue='49624079089036418142403452771971641647396567568076505090'}, id=30de651c-9d54-88ce-6d46-2e1b704d4acf, sourceData={SequenceNumber: 49624079089036418142403452771960761315020035905504149506,ApproximateArrivalTimestamp: Fri Nov 19 11:11:37 EST 2021,Data: java.nio.HeapByteBuffer[pos=0 lim=267 cap=267],PartitionKey: 1,}, contentType=application/json, aws_receivedPartitionKey=1, aws_receivedStream=test_stream, aws_receivedSequenceNumber=49624079089036418142403452771960761315020035905504149506, timestamp=1637338303174}]
GenericMessage [payload={"@class":"org.springframework.messaging.support.GenericMessage","payload":"Message1","headers":{"@class":"java.util.HashMap","id":["java.util.UUID","ad916281-1fb4-ccdc-f937-894083aa90f0"],"event.eventType":"createEvent","timestamp":["java.lang.Long",1637338302786]}}, headers={aws_shard=shardId-000000000000, aws_checkpointer=ShardCheckpointer{key='anonymous.7ff2f3aa-5972-4e75-bd28-17a62fbcad4b:test_stream:shardId-000000000000', lastCheckpointValue='49624079089036418142403452771971641647396567568076505090'}, id=cbf746d3-d319-fc7f-8e7c-91a66cb2945a, sourceData={SequenceNumber: 49624079089036418142403452771961970240839650534678855682,ApproximateArrivalTimestamp: Fri Nov 19 11:11:37 EST 2021,Data: java.nio.HeapByteBuffer[pos=0 lim=267 cap=267],PartitionKey: 1,}, contentType=application/json, aws_receivedPartitionKey=1, aws_receivedStream=test_stream, aws_receivedSequenceNumber=49624079089036418142403452771961970240839650534678855682, timestamp=1637338303176}]
So, that confirms that useNativeDecoding
does the trick for us.
Sorry, I'm not familiar with DataFlow and cannot say how properly to configure that sink with this property.
Only what I can tell that s3-sink
is fully based on a:
@Bean
public Consumer<Message<?>> s3Consumer() {
return amazonS3MessageHandler(null, null, null)::handleMessage;
}
which is very close to what I have tested against.
OK thank you for testing. I will try to understand what I do wrong ...
Hi again !
It seems to me that there is a difference when using SCDF
. I tried to replace the s3-sink with a log sink and both useNativeDecoding
properties :
--spring.cloud.dataflow.stream.app.label=log
--spring.cloud.stream.default.consumer.useNativeDecoding=true
--spring.cloud.stream.bindings.input.consumer.useNativeDecoding=true
--spring.cloud.stream.bindings.input.destination=test-opex.import-opex
--log.expression=#root
--spring.cloud.stream.bindings.input.group=test-opex
--spring.cloud.dataflow.stream.app.type=sink
Looks good because we can see other properties with spring.cloud.stream.bindings.input
added by SCDF
.
But here is the result (truncated) :
log-sink : GenericMessage [payload={"nomLot":"200312.69FA698X300032","matricule":"", (...), nativeHeaders={}, replyChannel=nullChannel, kafka_offset=11, scst_nativeHeadersPresent=true, kafka_consumer=org.apache.kafka.clients.consumer.KafkaConsumer@5c3c4ebb, kafka_receivedPartitionId=0, kafka_groupId=test-opex}]
As you can see, I use Kafka. I don't know if it makes any difference
Well ... I created a very simple producer like yours and this time I can see the difference as you showed ... So I guess the SCDF starter apps does have something different which causes the useNativeDecoding doesn't work as expected
For my sink, I used :
<spring-boot.version>2.5.5</spring-boot.version>
<spring-cloud.version>2020.0.4</spring-cloud.version>
@CEDDM ,
thank you for looking into this more!
You know the log-sink
is not a good sample to follow.
It has this configuration:
<functionDefinition>byteArrayTextToString|logConsumer</functionDefinition>
So, independently of the config, that byteArrayTextToString
function is going to make a trick for us to let those byte[]
to be human-friendly in the final logs.
I think we don't DataFlow for this test. There is just enough to have s3-sink
bound to Kafka topic.
So, we just produce data to that topic and see what is going on the consumer side.
Right now I'm trying to find which is would be closer to S3
to be able to test this in the isolation...
Hey all,
I'm believe I'm seeing the same issue described above when using Kafka binders and attempting to write to an S3 Sink.
I've attempted to use a variety of configurations on the Sink job of which have included:
input.consumer.use-native-decoding|useNativeDecoding = true
input.consumer.content-type|contentType = application/octet-stream
spring.kafka.consumer.value-deserializer = org.apache.kafka.common.serialization.ByteArrayDeserializer
all to no good effect.
Seems like from what other's have said a possible regression between versions? Not quite sure where to take this, although I see an interesting artifact in Maven Central that seems to indicate there is a Kafka-specific S3 Sink (although I can't quite find where the source code fo this is... doesn't seem to be hosted in this project, so I'm curious where it resides/who is publishing it: https://mvnrepository.com/artifact/org.springframework.cloud.stream.app/s3-sink-kafka/3.1.1).
I might try to figure out how to get that into my SCDF installation and see if anything fun comes of it.
Thanks, Andrew
Edit - maybe that code/image on docker hub as well is from this project and the source code is just bundled up and suffixed with kafka/rabbit, in which case I'm probably going to see the same issues. :)
Edit - looks like input.consumer.content-type
is the winner. Odd seeing some camel case and some kabob casing in the documentation. Couldn't end up getting it to work right with Kafka bindings (our cluster is beyond ancient) but Rabbit seems to have done the trick for the work I was trying to do.
The s3-sink-kafka
is a generated artifact. Essentially it is a result of bundling s3-consumer
artifact and Kafka binder to make it fully blown Spring Cloud Stream application.
The mentioned upstream issue has been fixed a while ago: https://github.com/spring-cloud/spring-cloud-function/issues/771.
And we also have released an new version of this Sink as well: https://mvnrepository.com/artifact/org.springframework.cloud.stream.app/s3-sink-kafka/3.2.1
Problem description: I created a stream in SCDF like this :
s3-source | my-processor | s3-sink
. The idea is to handle a XML source file in a s3 bucket, transform and enrich the data in my processor and write out a JSON file in another s3 bucket.It worked fine in
SCDF 2.7.1
with the s3 starter apps in version 3.0.0. But it doesn't work anymore inSCDF 2.9.1
with the 3.1.1 s3 apps.Generally speaking : No matter what the source is, serializing a POJO in JSON String out of my processor is not working anymore with s3 sink because it doesn't support String payload anymore as the error message says :
Unsupported payload type: [class java.lang.String]. The only supported payloads for the upload request are java.io.File, java.io.InputStream, byte[] and PutObjectRequest
Why String payload are not supported anymore in s3 sink ? Maybe to avoid encoding problems ? What is the correct way to handle this ? Adding an extra transform app just to change the String in byte[] seems a bit painfull and unnecessary ressources consumption.
I saw this issue #74 but I think it's not just a media type problem