phax / ph-schematron

Java Schematron library that supports XSLT and native application
Apache License 2.0
111 stars 36 forks source link

Caching and resourceID problem #118

Closed olavivaino closed 2 years ago

olavivaino commented 3 years ago

Theres a potential bug or unintended use regarding ReadableResourceString.java constructors or an issue with caching logic in SchematronResourceSCHCache.java Details below. ReadableResourceString has constructor ReadableResourceString (@Nonnull final String sString, @Nonnull final Charset aCharset) that calls another constructor ReadableResourceString (@Nullable final String sResourceID, @Nonnull final String sString, @Nonnull final Charset aCharset) with sResourceID set to null. Second constructor sets sResourceID to constant string value if parameter is null. This essentially means that all ReadableResourceString objects have the same sResourceID value if constructed with constructor ReadableResourceString (@Nonnull final String sString, @Nonnull final Charset aCharset).

Now SchematronResourceSCHCache.java is using sResourceID value when calculating sCacheKey for its resources stored in the cache. This means that all resources from ReadableResourceString will have the same sCacheKey value. That essentially breaks caching there. All works fine if theres just one resource (schematron etc) in the cache. However in production use there will be many different. But caching retrieves just the first one as they all have the same sCacheKey value.

We resorted to workaround by using ReadableResourceString (@Nullable final String sResourceID, @Nonnull final String sString, @Nonnull final Charset aCharset) constructor and supplied unique sResourceID (just string hash) and now caching works as expected.

Not sure what is correct fix here. Its unclear what sResourceID's intended purpose is. Is it supposed to have unique value always or not? Is sCacheKey calculation logic correctly assuming that sResourceID's value is always unique or not?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

qligier commented 2 years ago

I've encountered this bug as well with ReadableResourceByteArray. If the resource ID is not supplied in the constructor, it's set to "byte[]" and the Schematron validator will fail for all instances except the first. Specifically, the method SchematronResourceXSLT.fromByteArray() is dangerous to use because it doesn't set the resource ID.

The fix was quite simple but not obvious:

final var resource = new ReadableResourceByteArray(String.valueOf(Arrays.hashCode(byteArray)), byteArray);
final var validator = new SchematronResourceXSLT(resource);
phax commented 2 years ago

Thanks for mentioning this - I will find a suitable remediation soon

qligier commented 2 years ago

Awesome, and a big thank you for your work on ph-schematron!

phax commented 2 years ago

Finally done - I basically took over the version proposed by @qligier with a small extension

Will be part of 6.3.0 release

phax commented 2 years ago

@qligier I changed the version number to 6.2.5 but it should be on Maven central soon

qligier commented 2 years ago

Thank you, I'll update my dependencies!

phax commented 2 years ago

The main change is in ph-commons, so if you explicitly include this as well, make sure it is 10.1.5

jpwijbenga commented 2 years ago

Should this issue also be solved in the FileSystemResource? FileSystemResources are identified by path (according to the getter). But file contents may change.

phax commented 2 years ago

The caching problem was with the byte[] and similar classes (like String). For file system based resources it's up to the caller to decide if caching is useful or not - too many different versions are "right".

If you have a resource, that can potentially change, you just disable caching when you call ph-schematron, ok?

jpwijbenga commented 2 years ago

Sounds fair. :)