phax / ph-ubl

Java library for reading and writing UBL 2.0, 2.1, 2.2, 2.3 and 2.4 documents
Apache License 2.0
110 stars 40 forks source link

Error should be thrown/returned if not a valid document based on given schema #33

Closed kurbhatt closed 3 years ago

kurbhatt commented 3 years ago

Hello @phax, I am trying to learn UBL and used ph-ubl library to validate the Invoice XML document. XML document is based on UBL 2.1. Following is a sample code I have done:

public static void main(String[] args) {
    try {
        InvoiceType aUBLObject = UBL21Reader.invoice().read(ResourceUtils.getFile("classpath:UBL-Invoice-2.1-Example.xml"));
        if (aUBLObject != null) {
            IErrorList aErrors = UBL21Validator.invoice ().validate (aUBLObject);
            System.out.println(aErrors);
        }
    } catch (Exception e) {
        System.out.println(e.getMessage());
    }
}

I have intentionally deleted from the input XML file(UBL-Invoice-2.1-Example.xml). While reading XML on first-line aDoc will be null as is one of the mandatory fields as per schema. It prints in the console that: [SAX] cvc-complex-type.2.4.a: Invalid content was found starting with element 'cbc:IssueDate'. One of '{"urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":CustomizationID, "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":ProfileID, "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":ProfileExecutionID, "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":ID}' is expected. (org.xml.sax.SAXParseException: cvc-complex-type.2.4.a: Invalid content was found starting with element 'cbc:IssueDate'. One of '{"urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":CustomizationID, "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":ProfileID, "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":ProfileExecutionID, "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":ID}' is expected.)

How to catch this error as it just prints into the console. It should throw some error or return something so we can come to know that oh yes this is missing in input. I want to handle/collect errors if any in the input document. Please guide/correct me if I making any mistakes.

phax commented 3 years ago

You need to add a "ValidationEventHandler" in the chain. By default a logging event handler is installed. So instead of UBL21Reader.invoice().read you use UBL21Reader.invoice().setValidationEventHandler (aEventHandler).read (assuming you use ph-ubl 6.6.0 or later) where the ValidationEventHandler interface needs to be implemented. The default logging implementation is in class com.helger.jaxb.validation.LoggingValidationEventHandler Alternatively there is also a class com.helger.jaxb.validation.CollectingValidationEventHandler that collects all messages in an ErrorList object that you can than handle manually afterwards.

hth

kurbhatt commented 3 years ago

Hello @phax, thanks for your helping hand.

I have tried:

CollectingValidationEventHandler handler = new CollectingValidationEventHandler();
InvoiceType aUBLObject1 = UBL21Reader.invoice().setValidationEventHandler(handler).read(s);
IErrorList errorList = handler.getErrorList();
System.out.println(errorList.get(0).getErrorText(Locale.getDefault())); // get(0) for testing purpose only 

above prints

cvc-complex-type.2.4.a: Invalid content was found starting with element 'cbc:IssueDate'. One of '{"urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":CustomizationID, "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":ProfileID, "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":ProfileExecutionID, "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2":ID}' is expected.

This detail is too raw, Is there any way to process/digest this error and derive the field for which error occurs?

phax commented 3 years ago

Instead of getErrorText I suggest to use getErrorText also it gives more context.

I am not aware of any more "structured" way to get this information. This is how I get the errors from the XML parser - sorry. But if you find a smart way to get this in more detail, I am very interested....

kurbhatt commented 3 years ago

Instead of getErrorText I suggest to use getErrorText !!! I think both are same. Please correct me if I am wrong.

phax commented 3 years ago

Haha - good catc. Sorry. Please use getAsString instead of getErrorText.

getErrorText only returns the error message, whereas getAsString combines the other fields like error level, error code and location (if available).

kurbhatt commented 3 years ago

Alright, thanks for the details. If I will get any solution to structure an error then will surely post you. You can close the issue I think. :)