Open mkeskells opened 2 months ago
There are 2 serialization modes: all and "minimal" But minimal could/should be generalized to conditional one, as currently it has a hardcoded condition of value being a "default one". Ideally there could be custom expression of when the condition is met.
So what I would look into is introducing new CONDITIONAL
ObjectFormatPolicy
which would also require some evaluator on property itself when it would write out the value and when it would not. This could be some link to "static" method which would accept the json, writer and instance (plus probably method name for generalization).
Then gen code for "conditional/minimal" serialization would inject this hardcoded methods for evaluation instead. I'm assuming we would want to allow this custom method per property which would be used over object defined one when defined. Plus a way to make it always serialize without going through the code and checking it.
Hi here is a first stab at it
https://github.com/ngs-doo/dsl-json/pull/278
it would need more tests
Not sure that I have understood the relationship between the ObjectFormatPolicy and the setting in the DSL but this does I think work for me, and allows finer control
Happy to discuss
The are a few extra methods added to the JsonWriter
, but the bulk of the change generates code like below
private final com.dslplatform.json.PropertyAccessor<com.dslplatform.json.CombinedFormatTest.ImmutableComposite1> propertyAccessor = (instance, field) -> {
switch (field.getName()) {
case "x":
return instance.x;
case "s":
return instance.s;
case "d":
return instance.d;
default:
throw new IllegalArgumentException("Unknown field " + field.getName()); }
};
private final java.util.List<com.dslplatform.json.PropertyInfo<com.dslplatform.json.CombinedFormatTest.ImmutableComposite1>> propertyInfos = java.util.List.of(
new com.dslplatform.json.PropertyInfo("x", quoted_x),
new com.dslplatform.json.PropertyInfo("s", quoted_s),
new com.dslplatform.json.PropertyInfo("d", quoted_d));
public void writeContentControlled(final com.dslplatform.json.JsonWriter originalWriter, final com.dslplatform.json.CombinedFormatTest.ImmutableComposite1 instance) {
final com.dslplatform.json.JsonWriter.FilterInfo filterInfo = originalWriter.controlledFilterInfo(instance, com.dslplatform.json.CombinedFormatTest.ImmutableComposite1.class, propertyAccessor, propertyInfos);
com.dslplatform.json.JsonWriter writer = originalWriter;
for (com.dslplatform.json.PropertyInfo property : originalWriter.controlledStart(instance, com.dslplatform.json.CombinedFormatTest.ImmutableComposite1.class, propertyAccessor, propertyInfos, filterInfo)) {
writer = originalWriter.controlledPrepareForProperty(instance, com.dslplatform.json.CombinedFormatTest.ImmutableComposite1.class, propertyAccessor, filterInfo, property, writer);
if (writer != null) {
switch (property.getName()) {
case "x":
if (instance.x == null) writer.writeNull();
else com.dslplatform.json.NumberConverter.serialize(instance.x, writer);
break;
case "s":
if (instance.s == null) writer.writeNull();
else writer.serialize(instance.s, writer_s);
break;
case "d":
if (instance.d == null) writer.writeNull();
else com.dslplatform.json.CombinedFormatTest.DoubleConverter.JSON_WRITER().write(writer, instance.d);
break;
default:
throw new IllegalArgumentException("Unknown property " + property.getName());
}
}
}
originalWriter.controlledFinished(instance, com.dslplatform.json.CombinedFormatTest.ImmutableComposite1.class, propertyAccessor, filterInfo, writer);
}
I'll try to find some more time to review it this or next week, but on a first glance here are some comments:
writeContentMinimal
into writeContentConditional
where instead of condition being comparison with default value, it should be arbitrary condition. Its even fine to leave method name as is, just change the implementation of comparison checkAll in all, you seem to be on a good track, but try to stick to this two rules: no allocation allowed by default and sealed and final should stay as such
Hi thanks for the comments. I don't see how we can accomplish this without subclassing JsonWriter. The writer is embedded in so much of the library, and generated code, there isn't a simple way to intercept and vet the data written that I can see. And that's without considering custom serialisers
I havent profiled it, but I don't think that there are allocations outside of the static initialiser. I saw a couple of fields (that should have been static) and have made them so. That was the intention anyway
I was trying to leave to current path unaffected to retain performance. Injecting some property into the JsonWriter
to validate/tweak any update is slowing that path
the changes are incomplete and here https://github.com/ngs-doo/dsl-json/pull/278/commits/93e4f149d362912c7ceb8b8fa25b1fef10b2fc8d
adding this functionality without subclassing JsonWriter, and having a policy to allow filtering is more invasive, and the code is only vetting the key values, not any writes to the values. Doing the writes to the values would slow down the write path, and I would have to disambiguate the writes to be vetted from the writes that are 'safe'
the generated code is now a static block for all of the class data ( which would be used in any option I think
private final static com.dslplatform.json.ClassInfo<com.dslplatform.json.CombinedFormatTest.ImmutableComposite1> classInfo;
static {
final com.dslplatform.json.PropertyAccessor<com.dslplatform.json.CombinedFormatTest.ImmutableComposite1> propertyAccessor = new com.dslplatform.json.PropertyAccessor<com.dslplatform.json.CombinedFormatTest.ImmutableComposite1>(com.dslplatform.json.CombinedFormatTest.ImmutableComposite1.class) {
public Object getField(com.dslplatform.json.CombinedFormatTest.ImmutableComposite1 instance, com.dslplatform.json.PropertyInfo propertyInfo) {
switch (propertyInfo.getName()) {
case "x":
return instance.x;
case "s":
return instance.s;
case "d":
return instance.d;
default:
throw new IllegalArgumentException("Unknown field " + propertyInfo.getName()); }
}
};
java.util.List<com.dslplatform.json.PropertyInfo> propertyInfos;
java.util.ArrayList<com.dslplatform.json.PropertyInfo> _propertyInfos = new java.util.ArrayList<com.dslplatform.json.PropertyInfo>(3);
_propertyInfos.add(new com.dslplatform.json.PropertyInfo("x", new com.dslplatform.json.JsonAttributeInfo("x", 1, new String[] {}, true, com.dslplatform.json.JsonAttribute.IncludePolicy.NON_DEFAULT)));
_propertyInfos.add(new com.dslplatform.json.PropertyInfo("s", new com.dslplatform.json.JsonAttributeInfo("s", 2, new String[] {}, true, com.dslplatform.json.JsonAttribute.IncludePolicy.NON_DEFAULT)));
_propertyInfos.add(new com.dslplatform.json.PropertyInfo("d", new com.dslplatform.json.JsonAttributeInfo("d", 3, new String[] {}, false, com.dslplatform.json.JsonAttribute.IncludePolicy.NON_DEFAULT)));
propertyInfos = java.util.Collections.unmodifiableList(_propertyInfos);
classInfo = new com.dslplatform.json.ClassInfo<com.dslplatform.json.CombinedFormatTest.ImmutableComposite1>(com.dslplatform.json.CombinedFormatTest.ImmutableComposite1.class, com.dslplatform.json.CompiledJson.ObjectFormatPolicy.DEFAULT, propertyInfos, propertyAccessor);
};
and the generated controlled writer looks like
@Override public <CONTROL$INFO extends com.dslplatform.json.ControlInfo> boolean writeContentControlled(final com.dslplatform.json.JsonWriter writer, final com.dslplatform.json.CombinedFormatTest.ImmutableComposite1 instance, com.dslplatform.json.JsonControls<CONTROL$INFO> controls) {
final CONTROL$INFO controlInfo = controls.controlledInfo(instance, classInfo);
boolean hasWritten = false;
for (com.dslplatform.json.PropertyInfo propertyInfo: controls.controlledProperties(instance, classInfo, controlInfo)) {
switch (propertyInfo.getName()) {
case "x":
{
int[] value = instance.x;
switch (controls.shouldWrite(instance, classInfo, controlInfo, propertyInfo, writer, value, true, value != null)) {
case WRITE_NORMALLY:
writer.writeAscii(propertyInfo.getQuotedNameAndColon());
long positionBefore = writer.size();
if (value == null) writer.writeNull();
else com.dslplatform.json.NumberConverter.serialize(value, writer);
controls.afterPropertyWrite(instance, classInfo, controlInfo, propertyInfo, writer, positionBefore);
// expected fallthrough
case WRITTEN_DIRECTLY:
writer.writeByte((byte)','); hasWritten = true;
// expected fallthrough
case IGNORED:
break;
}
break;
}
case "s":
{
java.util.List value = instance.s;
switch (controls.shouldWrite(instance, classInfo, controlInfo, propertyInfo, writer, value, true, value != null)) {
case WRITE_NORMALLY:
writer.writeAscii(propertyInfo.getQuotedNameAndColon());
long positionBefore = writer.size();
if (value == null) writer.writeNull();
else writer.serialize(value, writer_s);
controls.afterPropertyWrite(instance, classInfo, controlInfo, propertyInfo, writer, positionBefore);
// expected fallthrough
case WRITTEN_DIRECTLY:
writer.writeByte((byte)','); hasWritten = true;
// expected fallthrough
case IGNORED:
break;
}
break;
}
case "d":
{
java.lang.Double value = instance.d;
switch (controls.shouldWrite(instance, classInfo, controlInfo, propertyInfo, writer, value, true, value != null)) {
case WRITE_NORMALLY:
writer.writeAscii(propertyInfo.getQuotedNameAndColon());
long positionBefore = writer.size();
if (value == null) writer.writeNull();
else com.dslplatform.json.CombinedFormatTest.DoubleConverter.JSON_WRITER().write(writer, value);
controls.afterPropertyWrite(instance, classInfo, controlInfo, propertyInfo, writer, positionBefore);
// expected fallthrough
case WRITTEN_DIRECTLY:
writer.writeByte((byte)','); hasWritten = true;
// expected fallthrough
case IGNORED:
break;
}
break;
}
}
}
return hasWritten;
}
I would appreciate comments on if this is the way that you want to proceed. I have left much of the other implementation in there still (I could not see a reason to remove it yet)
@zapov do you have any comments?
Hi, was a bit busy, but it should be better now.
So... I'm not a fan of introducing various structures into dsl-json which are meant for this logic. People have all kind of use cases and they should be managing the logic via their own data structures. So those ClassInfo and similar is not something I want to have in the library.
But... I think we should go a bit slower (not only because the PR was too large - would much prefer 2 PRs: one for rename and other for logic)... so lets try to cover the basics first.
Can you provide some usage examples how people would configure that and use it within dsl-json (though pseudo-code, not via PR). When we agree on that we can move forward. Because for me... this should work similarly to current minimal serialization logic (in a sense that some method should be called to decide if property should be serialized or not). So for this, similarly how there is this converters and NamingStrategy... I would prefer external classes are referenced which can then be used in generated code and will take care of logic for serialization.
So, first of all.. is the goal of this to have serialization which is:
or something in between?
So for me... currently you have this logic in generated code
class Instance { public int x; }
void serialize(Instance instance) {
if (instance.x != 0) { // which is default value
//serialize x
}
}
I would prefer implementation which mostly does in this case something along the lines of
void serialize(Instance instance) {
if (ExternalClassWithVisibilityRules.shouldSerialize(instance, "x", dsl)) { // which is default value
//serialize x
}
}
I'm even ok if there is an option to have that in class (eg when external serializer is not defined) - which is aligned with naming strategy
interface JsonCanSerialize {
boolean canSerialize(String propertyName, DslJson dslJson);
}
void serialize(Instance instance) {
if (instance.canSerialize("x", dsl)) {
//serialize x
}
}
In the end... what seems to me that it makes sense is that you are able to define this rule on the property itself in a similar manner (but we can leave that for later)
Hi Good to have you back!
I only introduced the ClassInfo
to make my life easier, and to avoid having to pass lots of parameters in lots of places, while we are agreeing the interface to change. I am not wedded to that, but we don't want to have something that has 5 parameters in lots of places
So the requirements that I have as that the redaction/filtering of the data must be at run time (we can do some compile time filtering as well), but we have to be able to determine that we have to hide some data that is leaking and remediate that, without having to re-release code.
The data configuring what to look for and how to interact with replacing that is regex expressions, and some constant strings
We have full control of the DSL and the environment that is used for this writing, and we don't need to make these setting affect all json serialisation on that JVM. Its does have to be thread safe, as used in logging code
There are 2 types of data that we want to filter
password
, cardNumber
, security.*
, pin
etcBEGIN.*PRIVATE KEY-----.*END.*PRIVATE KEY-----
We generally want to leave a marker to say that the data was redacted, rather then just hiding it, and downstream processes then kick in and we can do the compile time filtering potentially, in the slower cycle on the next release
We need to maintain the json structure as valid in the output, clearly
So in some cases we want to ignore a key/value entirely, and in some cases we want to write a different value, with some/all of the text changed, so its harder than just canSerialise
/shouldSerialise
in your examples above, that's why I had the enum return saying what to write, and to control if the value was written already and you need to add a ','
Because you don't want to subclass the JsonWriter
I need to retain information about what is checked and what isn't, to cope with serialising a map as a value, when I need to check that values written, or when there is a custom serialiser etc, ie when a value being written is more that a single string, number etc, or I need to intercept those writes as well, which seems hard to do.
I still think that subclassing JsonWriter would be simpler. You could still seal it so that the only subclasses are one with no controls and one with delegated controls, and some associated, pluggable visibility rules driving that
You asked to have this implementation leveraging the minimum logic, which meant that the logic need to determine if the value is default (null, o etc) without boxing, which causes some bloating of the interface
The reordering the fields that I built isn't a requirement, but it some of came for free with the other infra that I built, but we can drop that
Agreed that it would be good to have this on a property level, but that is a custom writer (as I see it) and its not the pressing requirement that I have, its an improvement (for me)
to use something like
void serialize(Instance instance) {
if (ExternalClassWithVisibilityRules.shouldSerialize(instance, "x", dsl)) { // which is default value
//serialize x
}
}
we would have to read instance.x
to check if it was something to be handled specially etc, and we can't customise the value written. Are you proposing that the code generated is just checking if the default value is serialised, not actually if the field with a non default value is serialised?
For me those are 2 separate problems. The feature of masking output so you don't leak secrets you can already do in multiple ways. Eg ideally you would have special types for passwords and you can assign type converter on that type and mask them on output. There is an example here how to do it for time type: https://github.com/ngs-doo/dsl-json/blob/master/examples/MavenJava/src/main/java/com/dslplatform/maven/Example.java#L184 You can even put it on something more generic (like a String)... but its always better to have more specific types for such cases. Alternatively you can assign converters on explicit properties (which makes sense for passwords). So I would say, masking output value is covered with existing features.
What dsl-json does not have ATM is this logic when you don't want to put anything out. Thats why I was suggesting to only cover that case. Technically if we extend the API to pass in writer, you can write something out and then tell dsl-json not to serialize the regular property. I'm kind of ok with it because I think it would work out of the box, although I think it would be better not to allow it for now (as there might come some future need to prevent it).
I think that we are talking at cross purposes
For specific field types - I am aware that we can use custom convertors to mask out passwords, and other special types that can be identified as part of the schema definitions, but his is a run-time catch, that serves the purposes of catching missed configuration, and when we need to apply additional rules and cannot just shut the system down, wait for a rebuild, and new release etc and then continue. The current facilities dont allow this to be addressed fast enough for our needs, we need to be able to change the rules on demand
For masking the values - There is no facility within the library to control the output values (on all fields) which is again what we need. This applies to any filed written that looks to contain data that we don't want to be in the json form, not just for some known fields. This also has the requirement that is needs to be immediate. We would need to sign a custom writer for every field of every object, and for any keys and value of maps etc, and then after that there are some edge case with existing custom writers to consider, and collections etc
You are right that there are 2 separate requirements, but you solution, if I understand it only addresses one of then, and requires a rebuild, which makes it unappropraite for our needs
So what that sounds to me is like we can certainly improve the signature of DslJson's tryFindReader, tryFindWriter and tryFindBinder to pass in more information there (location of the object, eg its class and property name) so you can handle this rules whenever something is trying to be written. This way you can instead return a writer which will mask the value, while knowing in which object and for which property this is, to be able to apply the rules - do not leak this property of this object.
The signature I suggested for deciding if we want to write out the field at all would certainly let you do the same (as you have the instance - so can take the instance class, its passing the property name and you have the DslJson instance which can contain this rules (as you can provide your own DslJson instance).
So - what are you expecting the generated code to look like (roughly)
Are you asking about tryFindReader
improvement? I would focus on that for now and we can later address the remaining issue.
I would add
public JsonReader.ReadObject<?> tryFindReader(Type manifest, String attribute, Class<?> location)
deprecate the old methods and redirect them to this one. Then generated code should just call them to provide this relevant attributes, eg instead of
this.reader_property = __dsljson.tryFindReader(manifest_property);
have something like
this.reader_property = __dsljson.tryFindReader(manifest_property, "property", com.dslplatform.json.models.UnknownTypeWithConverter.Generic.class);
btw. you didn't say does this address your need in full (at least around this masking needs)
I presume you meant __dsljson.tryFindWriter
Are you proposing that for every field that is written, we call
__dsljson.tryFindWriter(...)
and that determines what is written?
this would allow us to control the writing, and allow us to whitelist some types (e.g. for values that are numbers and booleans)
It would mean that we end up duplicating the writing code for array, maps etc, as we would need to validate the data prior to writing it, rather than interception the write. I don't think that there is a lot of code here though
It would mean that it would not work with a customed writer as I see it. E.g. one that writes something not generated by the plugin, but hand crafted
It would also mean that we could not omit a field, or change the property name written
Looking in writeProperty
, there seem to be lots of paths that don't use the writer_<fieldName>
, e.g. where there is a convertor
, its a JsonObject
, there is a target.convertor
, or an optimisedConvertor
Have I understood what you are proposing? Just want to confirm I have understood, and that we are talking about the same thing, before I spend much effort of thinking & coding
Yes, while tryFindWriter
is relevant for your use case, I think this pattern applies to all 3.
I'm mostly trying to confirm that extending the tryFind API in such a way does resolve your use case (your second point in requirements)
It does not affect converters, but I'm ok if there is an option to not use optimized converters, or even allow declaration of your own conversion to override optimized converters.
Explicit converters I think are different kind of problem as you should be able to have your own rules around them and eg either disallow them, or make sure they go through the approved converters only.
I would same JsonObject
is the same, as in... if you want to prevent its usage, you can implement some rules around compilation to not allow it, or only whitelist certain ones.
As for the second part, I've looked over the API and it gets a bit more complicated with managing boolean hasWritten if we allow passing the writer around, so I would avoid that for now. This does prevent you from writing some other name instead, but seems simpler to reason and good enough.
Anyway... I would support this by having CompiledJson
annotation with
Class visibilityCheck() default CompiledJson.class;
which would point to class such as ExternalClassWithVisibilityRules
and expect it to have shouldSerialize
method with relevant signature (method with class compatible signature, property name and dsl instance) with boolean signature for result.
Generated code would look something like
if (ExternalClassWithVisibilityRules.shouldSerialize(instance, "x", dsl)) { // returns boolean
//serialize x
}
The other question is should we just leave MINIMAL
as a policy or introduce a new one eg: CONDITIONAL
as mostly the main consequence is that since this would only affect writeContentMinimal
(and lets leave the name as is for now - we can change it later in another PR) what kind of code should be embedded inside this //serialize x part
If we leave only MINIMAL
I would assume there could be expectations that value is always serialized when it passes this outer check (which is probably ok) and it would mean that when visibilityCheck
is defined, instead of checking for default value in there, it should just write value out.
If there is a need to retain previous MINIMAL
behavior even after this external checks, then it does make sense to introduce additional one: CONDITIONAL
. But maybe for simplicity sake we could start with just when visibilityCheck
is defined always just write it out.
Unrelated, but by looking at the code it seems to me that there is a bug in this serialized code and there is a missing check to ensure size of the buffer. Eg in front of
writer.writeByte((byte)','); hasWritten = true;
there should be
writer.ensureCapacity(2);
just to make sure that when flushed to stream we can still go back one place due to conditional ending.
Its a bit similar to the problem in https://github.com/ngs-doo/dsl-json/issues/266
I have two requirements to not expose sensitive data
Ideally we would do (2) based on data at runtime
I could imagine that we could do (1) by some annotation that runs at compile time (but I am new to this library), and currently this is also expected to be a runtime setting. Maybe it could be both
For (2) I would imagine that the easiest solution would be subclassing JsonWriter (but this is a final class).
Happy to work on this with someone if that helps.
My first thought is that if we could make JsonWriter non final, I could do (2) And then expose the field that we are writing (as a complile switch?) then this could do (1) as well, but I could cope without this probably
What are you thoughts