Closed osiewicz closed 10 months ago
you can use diff_with_tokens
for providing arbitrary tokens with your own interning (or no interning at all). Token
s are just u32 newtype wrappers (with a pub inner field) so you can easily provide you own tokens (for example by just converting chars to u32). However, you will need to allocate vectors since diffing fundamentally needs random access.
Note that the histogram algorithm fundamentally requires interning (even for char diffs sice it needs a low cardinality input set) but for char diffs I would expect Myers to produce more appropriate results anyway. If you only use Myer diff then the value of num_tokens
doesn't matter (its not used) but the correct value would probably be u32::MAX
(or the maximum unicode codepoint plus 1).
If you can guarantee that your input is only ascii (or some reasonable other unicode subset) then you could also used histogram and pass 128 for num_tokens
Hey, thanks for making this crate. :) Would it be possible to call
diff
without having to intern the input first? In my use case (character-wise diffing) interning doesn't seem necessary, aschars
should be as cheap to compare as Tokens (that are justu32
's under the hood) - and interning has it's non-trivial cost.Thanks!