phax / ph-schematron

Java Schematron library that supports XSLT and native application
Apache License 2.0
116 stars 36 forks source link

Possibility to cancel / abort validation #119

Closed Michiel-s closed 3 years ago

Michiel-s commented 3 years ago

Hi @phax,

Is there a way to cancel or abort a validation? We've created a wrapper application around ph-schematron that can receive validation requests via a REST API.

To run the actual validation steps we execute a Callable task by a CachedThreadPool ExecutorService. We then try to get the response and wait for max 30 sec before returning a server timeout exception. However the task doesn't seem to act upon a Interruption signal. Is this something you could add to the ph-schematron code, to check for Thread.currentThread().isInterrupted() at regular intervals? Or do you see any other solution that we could implement?

We're currently using v5.6.0 and are preparing to move to v6.x

Relevant parts of our code:

static ExecutorService executor = Executors.newCachedThreadPool();
...
var task = new Callable<ResponseEntity<?>>() {
    public ResponseEntity<?> call() throws InvalidSchematronException, Exception {
        return processRequest(validationRequest); // inside processRequest method we call ph-schematron methods
    }
};
var future = executor.submit(task);
...
try {
    var timeout = Integer.parseInt(System.getenv("VALIDATOR_TIMEOUT") == null ? "30" : System.getenv("VALIDATOR_TIMEOUT"));
    response = future.get(timeout, TimeUnit.SECONDS);
    return response;
} catch (TimeoutException ex) {
    log.error("Server timeout.");
    return new ResponseEntity<String>("Server error (Timeout) occurred.", HttpStatus.INTERNAL_SERVER_ERROR);
}
...
finally {
    future.cancel(true);
}

The future.cancel(true); in the finally block sends the interrupt signal.

I've found this article probably explaining why executed task is not stopped: https://codeahoy.com/java/Cancel-Tasks-In-Executors-Threads/

Kind regards, Michiel

Michiel-s commented 3 years ago

In addition to my post above: the reason why we are having this issue is because we have users that submit very large xml files for validation that can take up very long. Some files are >200MB (uncompressed) and can contain up to 100k of items (e.g. product catalogue items) that need to be validated with 10+ rules.

phax commented 3 years ago

Hi @Michiel-s I added a quick test (see the previous commit) and my output looks like this, which seems to be okay:

5 [main] INFO com.helger.schematron.supplementary.Issue119Test - Spawn task
35 [main] INFO com.helger.schematron.supplementary.Issue119Test - Wait
35 [pool-1-thread-1] INFO com.helger.schematron.supplementary.Issue119Test - Loading Schematron
65 [pool-1-thread-1] INFO com.helger.schematron.supplementary.Issue119Test - Applying Schematron
1048 [main] INFO com.helger.schematron.supplementary.Issue119Test - Cancel
1048 [main] INFO com.helger.schematron.supplementary.Issue119Test - Done

What Schematron implementation are you using???

Michiel-s commented 3 years ago

I haven't run your test yet, but what we see is that the thread from the pool is not aborted. The main tread indeed reports that it executed the future.cancel correctly, which send an interrupt signal to the thread. But the validation process running in that thread isn't stopped.

We use com.helger.schematron.xslt.SchematronResourceSCH and are still on v5.6. Maybe we should first migrate to V6.x and see if that solves the issue already.

phax commented 3 years ago

Ah okay. You are saying because I quit my application, the other thread is stopped by accident, but when I would be waiting in the main thread I would see, that the other thread still runs. Is that what you're saying?

Michiel-s commented 3 years ago

Ah okay. You are saying because I quit my application, the other thread is stopped by accident, but when I would be waiting in the main thread I would see, that the other thread still runs. Is that what you're saying?

I think so yes.

phax commented 3 years ago

Okay yes, I can confirm that:

7 [main] INFO com.helger.schematron.supplementary.Issue119Test - Spawn task
38 [main] INFO com.helger.schematron.supplementary.Issue119Test - Wait
38 [pool-1-thread-1] INFO com.helger.schematron.supplementary.Issue119Test - Loading Schematron
68 [pool-1-thread-1] INFO com.helger.schematron.supplementary.Issue119Test - Applying Schematron
1056 [main] INFO com.helger.schematron.supplementary.Issue119Test - Cancel
2298 [pool-1-thread-1] INFO com.helger.schematron.AbstractSchematronResource - Read XML resource [cpPath=/issues/github119/ubl-tc434-example1.xml; urlResolved=true; URL=file:/C:/dev/git/ph-schematron/ph-schematron-xslt/target/test-classes/issues/github119/ubl-tc434-example1.xml]
2357 [pool-1-thread-1] INFO com.helger.schematron.supplementary.Issue119Test - Finished applying Schematron
4060 [main] INFO com.helger.schematron.supplementary.Issue119Test - Done
phax commented 3 years ago

Effectively that would mean, that more or less all the APIs would need to declare throws InterruptedException which would be a heavy change and would effect all the regular use cases.

What do you think about an UncheckedInterruptedException to be less intrusive?

phax commented 3 years ago

I pushed a 6.2.4-SNAPSHOT version to Maven Central Snapshots. It would be great if you could verify if it works for you as well. The output of the Issue119Test is now:

5 [main] INFO com.helger.schematron.supplementary.Issue119Test - Spawn task
35 [main] INFO com.helger.schematron.supplementary.Issue119Test - Wait
35 [pool-1-thread-1] INFO com.helger.schematron.supplementary.Issue119Test - Loading Schematron
64 [pool-1-thread-1] INFO com.helger.schematron.supplementary.Issue119Test - Applying Schematron
1074 [main] INFO com.helger.schematron.supplementary.Issue119Test - Cancel
1792 [pool-1-thread-1] INFO com.helger.schematron.supplementary.Issue119Test - Applying Schematron was successfully interrupted - Interrupted Schematron compilation: after XSLT step 1
4084 [main] INFO com.helger.schematron.supplementary.Issue119Test - Done
Michiel-s commented 3 years ago

We'll do that. I'll let you know

phax commented 3 years ago

@Michiel-s so how does it look like with your tests? If it is to your needs, I will release 6.2.4 asap

Michiel-s commented 3 years ago

Hi @phax, my colleague implemented and tested the snapshot last Friday and reported back to me that our unit tests indicate that the timeout works. Not sure if that includes the thread killig. He just went on a 2 week holiday break so I can't verify at this moment. But based on his comments, and also on your own observations, I would say let's release. If that is ok with you.

Thx very much for the fix. We really appreciate it. Please let me know if we can return any favor.

phax commented 3 years ago

@Michiel-s Thanks for the swift response - I will just build a 6.2.4 release build, and if you need interruptions on additional places - let me know. There will be a 6.2.5 or 6.3.0 anyway ;-)