Open EmergentBehavior opened 6 years ago
For your first paragraph, yes, the editscript is designed to do just that. (get-edits e)
return a vector. These vectors can be concatenated to represent a larger change. BTW, I added a 'combine` function.
For the second, it is a very interesting question. I have not encountered the cases where the patching process take too long. When these cases do appear, I will think about an optimizer.
On the other hand, editscript is designed with stream processing in mind. An editscript should be conceptualized as a chunk in a potentially endless stream of changes. So it is more meaningful to worry about data integrity, compression, windowing, etc, rather than the sizes of individual ediscripts. Optimizers in these contexts are indeed what I am very interested in.
Basically, I consider editscript as a part of the data-oriented effort of Clojure, that tries to elevate the level of abstraction of data from characters or bytes level to that of maps, sets, vectors, and lists level. So instead of talking about byte streams, we can talk about change streams in term of these data structures.
Do I make sense?
I haven't had a chance to try edit script yet, but I think it will play nice with Specter. It seems to me they have a similar view of the data.
@huahaiy Thanks for the answer. My latter paragraph was considering a scenario in event streaming where I rebuild the "present" version of an entity by composing all historical mutations over its entire history of existence (if checkpointing or other strategies weren't used).
@EmergentBehavior You scenario sounds similar to mine.
Given an editscript, there are indeed some opportunities to optimize, e.g. if one sub-tree will later be deleted, all edits happened inside that sub-tree could be safely removed without impacting the end results.
Such optimization may require the editscript to record some kind of identifiers for internal nodes. I will think about these.
Meanwhile, my current focus is to further improve the diffing speed. I am working on fingerprinting the data to avoid drill down sub-trees that have the same content.
Implementing some obvious optimizations should be a good starting point.
First, I think this library is pretty interesting. I was wondering about one use case though: let's say you have entity
A_t0
(where t is analogous to a time step) and you have an editscripte_0->1
to describe the transformation needed to getA_t0
toA_t1
. If you capture an editscript for transformations at each time step (if there is a change), you'd have a collection ofe
, right? Then if you want to get the present state ofA
you could just concatenate all those editscripts together (to describe changes betweent0
andtN
). Have you tried this use case?I wonder if at some point though, if the editscript gets large enough the patching process would slow down and it would be helpful to have some sort of editscript optimizer to reduce to the minimal editscript needed to get from
At0
toAtN
.