Closed kandu closed 5 years ago
I'm slightly worried about memory usage when I see the new types. Zed is meant to scale well to big files, for instance that why it uses ropes. With the new representation it seems like we are going to multiply the memory used to represent a buffer in memory by a non-negligible factor. Roughly, I would say by 48 on a 64 bits machine.
I agree. If we shrink Zed_char.t, to eliminate width
and size
is fine, then it would be
Zed_char.t = { core: UChar.t; combined: UChar.t list }
, which I think is best balanced between cpu/memory/logic clean
or we go further, Zed_char.t= UChar.t list
it's less clean but still acceptable.
To support wider/combined glyph upon utf8 would be messy and slow, IMHO.
The only way to make this scale well is to make Zed_char.t
and Zed_string.t
abstract and make their representation as compact as possible. It shouldn't make things more complicated for users of these modules. They'll just need to use functions rather than access fields directly.
It is acceptable if random accesses in Zed_string.t
values are slow. Zed_rope
mitigate this problem since it puts a upper bound on the size of Zed_string.t
values it embeds.
It makes things a bit more complicated, but at the end of the day scaling is hard...
With the latest commit, It's more but not maximumly compact, as a Zed_string.t is now still a bunch of utf8 strings. Which I think is a good balance point. If it's not compact enough. I'll add some functions like extract_zChar_from_utf8, get_zChar_of_utf8 and refactor the Zed_string module as well.
Well, the most important is that Zed_string.t
is compact. Indeed, the expectation is that the user might create huge ropes, and the data are stored as Zed_string.t
values inside ropes.
For Zed_char.t
, it is important as well but not as much, given that is rare to have large structures storing a lot of Zed_char.t
values. In practice, the record representation you had with only the first two fields seems acceptable and preferable to me: it is only one more word in general compared to the UTF-8 representation and operations on it are faster.
Zed_string.t is now encoded in utf8. And all rec functions in Zed_char & Zed_string are tail-recursive to deal with big files better.
Thanks. That seems good to me. Regarding the review, I'll be honest and I probably won't have the time to review this patch before at least three weeks. If you could find someone else to review it, that would be great. I'm personally happy with the change as long as a second pair of eye has gone over the code.
If no one else is available, ping me in three weeks.
I've been thinking about this more. I have less and less time to spend on these projects, and realistically I will not find the time to do a thorough review of the patches, test them, release them and provide follow-up support for them. Still, it seems to me that the feature your are proposing is valuable for users and that you already put a non-negligible amount of effort towards it.
Given that this is a now a community maintained project, I propose to let you do all the above tasks yourself. More precisely, I propose to give your write access to zed, lambda-term and utop so that you can merge these patches, release them in opam and provide follow-up support. Just to be clear, by follow-up support, I would expect that you do the following:
I ask that you do this for a period of at least one year. After one year, it's be up to you whether you want to continue providing support or not. I will also do my best to answer questions you might have about the existing code of these projects.
Does that work for you?
Thanks for your trust.
Next, I'll stabilize the API, test and fix potential bugs, optimize the performance. I should mention that neither ocaml nor english is my native language. Sometimes I'll ask for some help on polishing code, doc, writing change logs, release notes. But I think that's all right. let's get started as you said.
Sounds good, and no problem for the language, I'm not a native english speaker either :)
@pmetzger I don't seem to have access to the settings for zed and lambda-term, could you add @kandu as a collaborator? Thanks
@diml I'd feel more comfortable adding someone after an initial successful contribution rather than before, but if you're okay adding them now, we could do that.
I hear you. In this case, the work to do was quite big and non-trivial and the person did it and promptly adressed the points I was concerned about. So I'm thinking that @kandu must be serious about contributing and that we can make an exception in this case. So I'm happy to add @kandu now.
More precisely, I propose to give your write access to zed, lambda-term and utop so that you can merge these patches, release them in opam and provide follow-up support
Okay, @diml, @kandu is now added with write access to all three projects.
Thanks
Thanks
Should we do a new release of Zed?
After the latest tweak of Zed_string.sub & Zed_string.sub_ofs, I think the API is stable enough to make a new release.
Please arrange for new releases of these tools then.
changes
type, distinguish postion-change between width-change, so does the functions in this module.