yoriyuki / Camomile

A Unicode library for OCaml
Other
125 stars 26 forks source link

We need a roadmap to 1.0.0 #12

Closed foretspaisibles closed 6 years ago

foretspaisibles commented 9 years ago

Many of us would love to see a version 1.0.0 of Camomile! But how can we start to help if we do not have a roadmap? It would be great to prepare one!

foretspaisibles commented 9 years ago

Just a gentle ping on this! :)

yoriyuki commented 9 years ago

Cook it in the way as you like. For me, Camomile is dead. (Actually, OCaml is dead)

foretspaisibles commented 9 years ago

Wow, this is a statement! :)

Thank you a lot for the good work on Camomile!

Am I right thinking the various u* (ucorelib, etc.) libraries were meant to split Camomile in smaller libraries? Just out of curiosity, where are you looking at now? It looks like the OCaml community is thriving now, and definitely misses you! But anywhere you go, I wish you the best! :)

youjinbou commented 9 years ago

I don't know what makes you say that, but I feel that you're making a mistake. Are you moving to a much more potent language? (I was thinking Agda, Idris, ATS or even Coq). Anyway, I hope you'll find what you are looking for.

On 10 August 2015 at 14:19, Yoriyuki Yamagata notifications@github.com wrote:

Cook it in the way as you like. For me, Camomile is dead. (Actually, OCaml is dead)

— Reply to this email directly or view it on GitHub https://github.com/yoriyuki/Camomile/issues/12#issuecomment-129425274.

rgrinberg commented 8 years ago

@yoriyuki sorry to hear about you and OCaml. I wish you the best.

Would you mind letting some users help out with maintenance on your projects then? For example, adding @michipili as a maintainer to Camomile? (Sorry for the nomination!)

yoriyuki commented 8 years ago

I added @michipili and you (@rgrinberg) to collaborators :-) If you need more setup (for example, creating a group), let me know.

yoriyuki commented 8 years ago

If other guys want to be collaborators, please let me know.

foretspaisibles commented 8 years ago

Thank you for doing this, @yoriyuki

foretspaisibles commented 8 years ago

@rgrinberg @tategakibunko I think it would be great to call the community to attention about the current state of Camomile and prepare together a roadmap for v1.0.0. I will prepare a backlog milestone and a few tickets to it, so that we can start discussing ideas.

If you (anybody) have any wish, please open tickets or start the discussion here. Thank you again @yoriyuki for the beautiful library you wrote!

seliopou commented 8 years ago

I'd like to help out as well. Could you add me as a collaborator?

foretspaisibles commented 8 years ago

Here is a short proposition for the next (pre 1.0.0) milestones:

How does it sound?

yoriyuki commented 8 years ago

done. @seliopou

yoriyuki commented 8 years ago

What I thought for version 1.0.0 is, to support the recent Unicode version. The current Camomile only supports Unicode 3.2, which is ancient. A lot is changed from these days. As far as I know, to support the recent version, we need

  1. Update data file. I made uproplib, which is an interface to data files of Unicode version 7.0 (which, in turn, is based on Daniel Bünzli 's uucd) for this goal.
  2. Uppercase/Lowercase mapping. They added extra rules for ancient greek.
  3. Local data. Camomile uses ICU data, but now Unicode consortium published LDML (Locale Data Markup Language) repository. I think the syntax of collation rules did not change, but they are now embedded in an XML format.
  4. Unicode Collation algorithm. I did not check this yet. We need to look the change log of the technical report.
  5. Charmap files. I hope the format did not change, but probably we need to update the contents.

Just for suggestions.

seliopou commented 8 years ago

@yoriyuki thanks, these are great suggestions. I was poking around the ICU and CLDR and did notice that Camomile is way behind on the Unicode standard. You'll be happy to know, however, that CLDR produced an LDML to ICU converter, and ICU continues to ship the converted ICU files. I'm not sure which features these converted files will cover, but hopefully it won't be necessary to deal with XML within Camomile. If you know otherwise, it would be great to know.

yoriyuki commented 7 years ago

Do we have any timetable to 1.0.0? I'm still interested in supporting the newest version of Unicode, but it takes time (so we may aim it for 2.0.0).

rgrinberg commented 7 years ago

Yoriyuki Yamagata notifications@github.com writes:

Do we have any timetable to 1.0.0? I'm still interested in supporting the newest version of Unicode, but it takes time (so we may aim it for 2.0.0).

I don't think we have any timetable - after all, this issue was made in 2015 :P

That being said, I'm also very interested in supporting the latest unicode. We should start marking the issues that we definitely want to make it to 1.0 (build system, camlp4 removal, etc.)

yoriyuki commented 7 years ago

I post the issues which I think, are needed to solve to support the latest Unicode standard

27 #28 #29 #30 #31

yoriyuki commented 7 years ago

Just for starting a discussion, I assigned several issues to the milestone v1.0.0

yoriyuki commented 6 years ago

I want to start working for the latest Unicode standard. But how we proceed? I will make a new branch for this purpose. Updating the latest standard would brake the library, even make it not compilable. Then, the final result would be completely different from the current one. If you change the main branch during this process, the change is difficult to port. What is the best practice for this kind of cases?

Another issue is: Should we create a branch for 1.0 or continue to develop on the main branch until 1.0 release?

I appreciate your input.

rgrinberg commented 6 years ago

Yoriyuki Yamagata notifications@github.com writes:

I want to start working for the latest Unicode standard. But how we proceed? I will make a new branch for this purpose. Updating the latest standard would brake the library, even make it not compilable. Then, the final result would be completely different from the current one. If you change the main branch during this process, the change is difficult to port. What is the best practice for this kind of cases?

Another issue is: Should we create a branch for 1.0 or continue to develop on the main branch until 1.0 release?

I appreciate your input.

For 1.0, I think our best course of action is the conservative one. Even though Camomile never made to 1.0, it is de-facto the most popular unicode library in OCaml, and is used by many other libraries and applications. IMO, that's a good reason to basically freeze it's interface and let current users follow semver properly. Therefore we should freeze the current API until we release 1.0. If we're introducing breaking changes, we should at the very least inform all the downstream users we are breaking before release 1.0

Another issue is: Should we create a branch for 1.0 or continue to develop on the main branch until 1.0 release?

This is your call, but I'd prefer if master stayed stable and w/e API changes you had in mind were in a different branch with a PR open so that we could review them.

If you change the main branch during this process, the change is difficult to port. What is the best practice for this kind of cases?

Yes, this is quite a tough issue. One thing to note that once your new API is in development, we're seldom going to make big changes to master. After all, the api of 1.0 should stay frozen. We can of course help with the rebase/conflict resolution work here as well.

yoriyuki commented 6 years ago

I did three things:

  1. Make the master branch is protected. To change the master, you need to create a pull request and let someone review it.
  2. Create v1.0 branch. This branch is an integration branch for version 1.0 and also protected.
  3. Create Unicode10.0.0 branch. I will work for support of Unicode 10.0.0 using this branch. The branch is not protected.
rgrinberg commented 6 years ago

Yoriyuki Yamagata notifications@github.com writes:

I did three things:

  1. Make the master branch is protected. To change the master, you need to create a pull request and let someone review it.
  2. Create v1.0 branch. This branch is an integration branch for version 1.0 and also protected.
  3. Create Unicode10.0.0 branch. I will work for support of Unicode 10.0.0 using this branch. The branch is not protected.

Agreed with this approach. But I'm curious, why do we need a v1.0 integration branch for now? Seems like such a branch will be useful once we have released 1.0, and are already working on your new branch in master. In such a situation the integration branch would be useful to make emergency bug fixes to 1.0. Or do you have another workflow in mind?

yoriyuki commented 6 years ago

But I'm curious, why do we need a v1.0 integration branch for now? Seems like such a branch will be useful once we have released 1.0, and are already working on your new branch in master. In such a situation the integration branch would be useful to make emergency bug fixes to 1.0. Or do you have another workflow in mind?

Although I want API change minimal in 1.0 but there will be API change. If we develop 1.0 using the master, people using the master would be surprised. In particular, we do not have an backward compatible branch for v0.x.

My plan is once 1.0 is released, v1.0 is merged to the master and development of 1.x is continued on the master branch. In the same time, we make the integration branch v2.0 and start developing 2.0.

Sure, it is not the Git workflow but I feel reasonable. What do you think?

rgrinberg commented 6 years ago

Ah ok. Makes sense to me. Ok so we'll only be making fully backwards compatible changes to master. All api breaking changes for 1.0 will be kept in a branch.

Yeah, I think that's reasonable.

yoriyuki commented 6 years ago

I made some progress on #28

yoriyuki commented 6 years ago

Another issue, which I always am worried about, is that Camomile is too large. Now UCD part taking almost 30Mbytes, we may need to do something for this. I'm always thinking about splitting Camomile into different packages so that a user can choose the level of functionalities.

For example,

  1. camomile-basic: basic data types like UTF-8 encoded strings, IO, some basic character encodings. Maybe monads and transducers etc. for convenience.
  2. camomile-algorithm: Unicode related algorithms like case mapping, collation, line breaking, etc.
  3. camomile-encodings: more esoteric character encodings.

Of course, there is another issue when we do it. Version 1.0 is a good occasion but it delays the release of version 1.0 further.

Might be a good idea to announce the existence of v1.0 branch.

yoriyuki commented 6 years ago

Anyways I made a group https://github.com/ocaml-camomile and 4 empty repositories.

yoriyuki commented 6 years ago

Started working on camomile-basic https://github.com/ocaml-camomile/camomile-basic It is already able to built.

rgrinberg commented 6 years ago

Yoriyuki Yamagata notifications@github.com writes:

I agree with the idea of making Camomile more modular, but I disagree with the idea that multiple git repositories are the way to go about this.

A few of the disadvantages that an explosion of git repositories brings to the table:

This is quite a lot of extra overhead for only 2 developers...

We can make Camomile just as modular without creating new git repos. Instead, we can split Camomile into multiple opam packages that exist in the same repository. This will give the user the same benefits you've mentioned. They'll pay for only what they use and be able to express their dependencies more accurately. I maintain many packages this way and (cohttp, conduit, etc.) and this is a far better approach in my opinion. I can help with the initial organization if you're not quite sure how to do it. Though it's quite simple, mostly just 1 directory per package and keep all the *.opam files at the root of the repo.

As for 1.0, I think that perhaps we should just release master now as 1.0 (unless you have some last second improvements) and start making the breaking changes we're planning post 1.0. As we discussed before, the current version of Camomile is already in widespread use, so it's very likely we'll need to make bug fix point releases to it. Might as well give a stable version number to an already de-facto stable API.

yoriyuki commented 6 years ago

I see. Then we use the same repo. as now but split the opam package.

Since we have many tasks, we need to prioritize tasks.

  1. First prepare 1.0 quickly. I think we can merge your no dynamic configuration PR and update the contact address. (many files indicates the sourceforge address now.) Just one search/replace. I also want to unify Camomile's UChar and OCaml stlib's Uchar.
  2. Next, create v2.0 branch and split the package. Since this is a huge API change I think it is better to put them v2.0. Could you do an initial set-up? Then, I want to remove imperative Unicode strings (xString and ustring).
  3. Finally, support the recent Unicode standard for v3.0. As this is again a large change of behavior and APIs, it is better to make another major update.
rgrinberg commented 6 years ago

I think we can merge your no dynamic configuration PR and update the contact address

Sure. Also, saving it for post 1.0 is an option (up to you).

I also want to unify Camomile's UChar and OCaml stlib's Uchar.

I actually gave this a try already. This is also a breaking change unfortunately as stdlib's Uchar.t is more restrictive and doesn't allow any integer. From the docs:

(** The type for Unicode characters.

    A value of this type represents an Unicode
    {{:http://unicode.org/glossary/#unicode_scalar_value}scalar
    value} which is an integer in the ranges [0x0000]...[0xD7FF] or
    [0xE000]...[0x10FFFF]. *)

Camomile's Uchar is more permissive.

To clarify the branching situation:

rgrinberg commented 6 years ago

One more change I had in mind for 2.0 is the following use the Camomile module name for the library itself. Rather than using the more verbose CamomileLibrary. The Library suffix isn't really helpful.

yoriyuki commented 6 years ago

I think we can merge your no dynamic configuration PR and update the contact address

Sure. Also, saving it for post 1.0 is an option (up to you).

Done. It is inconsistent to what I said but doesn't matter much.

I also want to unify Camomile's UChar and OCaml stlib's Uchar.

I actually gave this a try already. This is also a breaking change unfortunately as stdlib's Uchar.t is more restrictive and doesn't allow any integer. From the docs:

(** The type for Unicode characters.

    A value of this type represents an Unicode
    {{:http://unicode.org/glossary/#unicode_scalar_value}scalar
    value} which is an integer in the ranges [0x0000]...[0xD7FF] or
    [0xE000]...[0x10FFFF]. *)
Camomile's Uchar is more permissive.

This restriction was implicit in Unicode Standard, and now it is explicit. So it's okay to (actually we should) restrict Unicode to this range. In particular, UChar with [0xD800] - [0xDFFF] code range brakes many algorithms.

To clarify the branching situation: master is where we do development of the upcoming version (1.0 as of now, 2.0 later, etc.) After a release, we have a version specific where we backport bug fixes/improvements whenever possible.

Sure. We need to do something about existing v1.0 branch but it is easy to deal with.

One more change I had in mind for 2.0 is the following use the Camomile module name for the library itself. Rather than using the more verbose CamomileLibrary. The Library suffix isn't really helpful.

Sure. I think v2.0 is a good opportunity to rearrange the module structure.

yoriyuki commented 6 years ago

I found migrating stdlib Uchar is not easy, because unidata contains non-scalar (out-of-range) code points. To fix this, we need to modify the large portion of the code. I think v3.0 when we support Unicode 10 is better opportunity.

So, I think now we are ready to release v1.0. Do you have anything left to do?

yoriyuki commented 6 years ago

Now we have 1.0.1. Continue to discuss on 2.0.0 at #70