phax / phase4

phase4 - AS4 client and server for integration into existing systems. Specific support for Peppol and CEF eDelivery built-in.
Apache License 2.0
157 stars 48 forks source link

IPhase4PeppolIncomingSBDHandlerSPI and encoding of byte [] aSBDBytes #189

Closed iansmirlis closed 1 year ago

iansmirlis commented 1 year ago

Hello, I have a question for implementing IPhase4PeppolIncomingSBDHandlerSPI interface.

I want to process aSBDBytes as string, but I have lost information about character encoding in order to properly convert the parameterbyte [] aSBDBytes.

I know I can write an IAS4ServletMessageProcessorSPI instead to manually process the attachment, but in this case I will have to recreate the PeppolSBDHDocumentReader( ... ) logic.

Maybe I am missing something, but is there an easy way to get the character encoding inside handleIncomingSBD?

Thanks

phax commented 1 year ago

aSBDBytes is a byte array that contains the "raw" decrypted and decompressed payload as received via AS4. As XML is a binary format that can be represented in multiple different encodings, there is no other way then to parse the XML byte array. The XML parser will correctly determine if this is UTF-8, ISO-8859-1 or whatever decoded. That's why XML is considered a "binary" format and not a "text" format. See https://www.w3.org/TR/xml/#sec-guessing for more details :)

In case you want a "safe" String representation of these SBD bytes you need to 1.) parse SBD bytes to XML and 2.) serilaize the in-memory representation (org.w3c.dom.Document) into a String with the desired character set.

iansmirlis commented 1 year ago

Thank you for your prompt reply.

parse SBD bytes to XML

I assume that since I am in the handleIncomingSBD() function, the aSBDBytes is a valid xml document, right?

I just would like to avoid the trial and error to guess the encoding, since the required info should already exist in the aIncomingAttachment object before the call to handleIncomingSBD(). I think that the xml encoding of the payload should be the same, or I understood something wrong.

Do you think it's a good idea for a pointer from PeppolSBDHDocument to the related WSS4JAttachment in order to get such info, or I am asking too much?

phax commented 1 year ago

The encoding of the SBDH is not stored in the underlying WSS4JAttachment - it's in the SBDH only. But let me simplify it. You are talking about this signature, right:

  /**
   * Handle the provided incoming StandardBusinessDocument
   *
   * @param aMessageMetadata
   *        Message metadata. Includes data when and from whom it was received.
   *        Never <code>null</code>. Since v0.9.8.
   * @param aHeaders
   *        The (HTTP) headers of the incoming request. Never <code>null</code>.
   * @param aUserMessage
   *        The received EBMS user message. Never <code>null</code>. Since
   *        v0.9.8.
   * @param aSBDBytes
   *        The raw SBD bytes. Never <code>null</code>.
   * @param aSBD
   *        The incoming parsed Standard Business Document that is never
   *        <code>null</code>. This is the pre-parsed SBD bytes.
   * @param aPeppolSBD
   *        The pre-parsed Peppol Standard Business Document. Never
   *        <code>null</code>. Since v0.9.8.
   * @param aState
   *        The message state. Can e.g. be used to retrieve information about
   *        the certificate found in the message. Never <code>null</code>. Since
   *        v0.9.8
   * @throws Phase4PeppolClientException
   *         if this specific exception is thrown, it translates into a
   *         synchronous AS4 error message.
   * @throws Exception
   *         In case it cannot be processed. If
   *         {@link #exceptionTranslatesToAS4Error()} returns <code>true</code>
   *         each Exception is converted into a synchronous AS4 error message.
   */
  void handleIncomingSBD (@Nonnull IAS4IncomingMessageMetadata aMessageMetadata,
                          @Nonnull HttpHeaderMap aHeaders,
                          @Nonnull Ebms3UserMessage aUserMessage,
                          @Nonnull byte [] aSBDBytes,
                          @Nonnull StandardBusinessDocument aSBD,
                          @Nonnull PeppolSBDHDocument aPeppolSBD,
                          @Nonnull IAS4MessageState aState) throws Exception;

In this method, the same StandardBusinessDocument is provided in 3 different versions:

So don't make it too complicated :)

iansmirlis commented 1 year ago

Great thanks! I didn't know about the SBDMarshaller

phax commented 1 year ago

Does it work as expected? If so, please feel free to close this issue :)

iansmirlis commented 1 year ago

Thanks, it works great. May I only suggest that it would be useful if there was a comment in the SBDHandler examples about the SBDMarshaller, because it's quite easy to miss that :)

Thanks again

phax commented 1 year ago

Sure, will do :)