Open WilliamParker opened 7 years ago
This could potentially be as simple as an implementation that uses pr-str.
I don't really think this is a "simple" or very tractable path. I originally tried to write the session serializer with a pr-str
based default and there were many issues with its preservation of types, metadata, and conflicts with other usages of pr-str
throughout clj, popular REPL middleware, etc.
Also, the pr-str
format is very bloated and very slow serializing non-code-centric data. I really don't think it is an appropriate usage of it.
However, I think a default Fressian implementation for Clojure data structures could be provided fairly easily since there has already been a lot of work done to make Fressian work well for the Clj to Clj SerDe use-case we are going for here.
It's an "optional" dependency, but I still think that would be useful for a consumer that only needed a pure-clj-based serialization impl.
Keep in mind that this SerDe doesn't really lend itself too well to schema evolution of the underlying models though.
Conflicts with other libraries could be a problem. Metadata on facts is dangerous in Clara anyway since Clara uses equality semantics, not identity semantics, but if you guarantee that no two facts exist such that they are equal but their metadata is unequal it is usable and ideally a durability implementation wouldn't break things allowed without durability. Performance might not be such a large concern if this ended up just being a way to demo/prototype durability and any serious usage involved writing a custom memory serializer, but of course a default that both allowed a quick demo and was performant enough for use in real use cases would be better.
Conflicts with other libraries could be a problem.
Fressian is an optional dependency. The consumer must explicitly add it if they wish to use a namespace that loads Fressian. So there is no conflict here.
If you want a reasonable Fressian default impl, you are free to use it. It'll be tested in the :dev
profile of Clara against some (modern) Fressian version.
Metadata on facts is dangerous in Clara anyway ...
This is only true if you consider the metadata to be the distinguishing characteristic between two or more facts. That is certainly not the only use of metadata. Losing all metadata post-serialization can definitely lead to mysteries in the consuming application.
Performance might not be such a large concern if this ended up just being a way to demo/prototype durability and any serious usage involved writing a custom memory serializer, but of course a default that both allowed a quick demo and was performant enough for use in real use cases would be better.
I worked through implementing it with print-dup
(and later print-method
) at length. It was really messy. Many things are not implemented there that are typically needed when trying to use it as a full-blown serialization format. The recommendation from the community was to use print-method
instead of print-dup
for something like this.
print-method
had many nasty edge cases and hierarchical conflicts when trying to apply this to general-purpose Clojure data.
I really think print-method
or print-dup
are really only practical serialization formats for transmitting code-as-data or restricted subset of Clojure data structures.
Also, the performance characteristics are really bad. Especially if you plan to have data types involved like deftype
or defrecord
created Classes. There is a ton of non-cached reflection done by the Clojure Reader when parsing this. It really is atrocious. :)
print-dup
is even worse in terms of reflection in the reader since it attempts (but not very reliably) to preserve data types.
I really just believe this use-case is an abuse of the purpose of print-dup
and print-method
and would be opposed to seeing it be a default maintained and provided by Clara.
The Fressian approach seems much more practical at this point since so many of the custom . handlers that are needed are already available in the codebase. Again, this would be an optional dependency. The d/IWorkingMemorySerializer
impl would be in a namespace like clara.rules.durability.fressian
in the same way the d/ISessionSerializer
currently is.
I'll also note, the only Clara-provided d/ISessionSerializer
impl is in clara.rules.durability.fressian
right now. And it requires the consumer to provide a Fressian impl.
It is mostly undesirable for a consumer to implement d/ISessionSerializer
themselves since it will be fragile and tightly coupled to Clara internal structure changes. This was only done as a protocol so that
(1) Fressian could be optional
(2) Clara could offer alternatives to Fressian impl's of d/ISessionSerializer
for consumers that didn't want Fressian. However, this would be at the discretion of Clara to find appropriate alternatives, like perhaps
Regarding metadata, I think we're in agreement. You'll only have problems if you're inserting facts that are equal but with different metadata and you have expectations about which one you keep if, for example, you retract one.
To clarify, my concern with Fressian was definitely not about needing to bring it in as a dependency. I see the goals here as just to create something that is super easy for someone experimenting with Clara durability to play around with before committing to the effort needed to use durability in a real use case. In many ways an IMemorySerializer that didn't actually serialize anything, but was just an object the user could keep in the REPL, would meet this need. A serializer that was equally simple to use but could be used for real use cases would be preferable though. My main concern with Fressian was that binary would be confusing to users since the serialized facts wouldn't be human-readable. However, on further thought we could just stamp both the session and serialized memory with a UUID and throw an exception if a user tries to deserialize an exception and those don't match. This could be done in a passive way if the durability implementation only did this if the memory serializer implemented a "SerializedFactsVersion" protocol. This sort of error reporting would mitigate most of my concerns there.
So I think I understand where you are coming from. You just don't like the binary format due to its opaqueness? That certainly does make it difficult to troubleshoot, I'd agree. Too bad it isn't like Avro where you can encode as binary or JSON with a switch of encoder/decoder.
However, the data format here is just difficult to express - more difficult than something restrictive like JSON/Avro.
The optimal way to serialize facts is heavily dependent on the exact fact types and structures. Clara's durability implementation, which is discussed at https://github.com/cerner/clara-rules/issues/198, currently both allows and requires users to supply a serializer for their facts in the form of an implementation of IWorkingMemorySerializer. However, it makes sense to me for Clara to provide a default, perhaps naive, IWorkingMemorySerializer for Clojure data types. This could potentially be as simple as an implementation that uses pr-str. Even if serious use ends up requiring a more specialized implementation tailored to a user's specific case a default that could be used in initial experiments and prototyping would lower the barrier to entry of using the durability API.