wduquette / molt

Embeddable TCL Interpreter for Rust applications
BSD 3-Clause "New" or "Revised" License
103 stars 12 forks source link

Add new string subcommands #102

Closed dbohdan closed 4 years ago

dbohdan commented 4 years ago

In this PR I have implemented the following string subcommands:

I have added over 60 tests for them in total. I didn't make any changes to the Molt Book. Should contributors do that? The implementation of string map seems a little ugly. I am not sure how to improve it.

Resolved issues: #75, #77, #79, #80, #84, #85, #86.

wduquette commented 4 years ago

What a pleasant surprise!

I'll look through the code first thing tomorrow morning, and let you know if I have any heartburn; otherwise I'll go ahead and merge it.

Regarding the Molt Book, you're welcome to take a whack at it; the big deal, as I'm sure you've noticed, is to:

You already know the "TCL Liens" for the new implementation, if any; if you update the Molt Book string.md page, I might change the wording a bit but it makes me less likely to miss something. Similar for the rustdocs for the public API.

The "how to" sections of the Molt Book I probably want to update myself.

Thank you!

wduquette commented 4 years ago

Thank you especially for providing string map, which I'd been dreading.

dbohdan commented 4 years ago

You're welcome! I look forward to your feedback. As for the Tcl Liens, I see two that are relevant to the subcommands I have implemented but not specific to them or string.

  1. Tcl 8 and Rust handle Unicode differently. I haven't compared them closely. Molt must at least lack the TCL_UTF_MAX limitations that standard builds of Tcl 8.6 have. The particular differences need to be investigated more substantially.
  2. string range does not understand the integer[+-]integer or the end[+-]integer syntax for indices. There is already a note about this limitation on the Molt Book page for lindex. Since the limitation applies to all range and index commands, I think this note should be refactored into a "global" one that the Liens for particular commands can link to.
wduquette commented 4 years ago

Regarding Unicode, my plan with Molt so far has been:

The tricky bit is that as a user of U.S. English and a guy who literally grew up with ASCII, I tend not to create files that contain anything other than standard ASCII, so I don't tend to run into the edge cases.

wduquette commented 4 years ago

Though I see you've added some Unicode test cases. Excellent!

wduquette commented 4 years ago

Haven't used the Github "code review" feature before; had I known it was going to display a big angry red "Changes requested" banner, I might have done it differently.

dbohdan commented 4 years ago

Haven't used the Github "code review" feature before; had I known it was going to display a big angry red "Changes requested" banner, I might have done it differently.

Oh, I didn't take it as angry. (But I guess the red banner does look a little angry?)

I have resolved the change requests. Please review.

wduquette commented 4 years ago

I'll take a look and most likely merge these later this morning.

wduquette commented 4 years ago

Re: Molt, string indices, graphemes, and grapheme. Did a little looking around; per the Rust documentation, Rust characters are Unicode scalar values, which are similar to but not identical to code points: "Unicode Scalar Value. Any Unicode code point except high-surrogate and low-surrogate code points." The discussions of how to compute Rust string length that I'd read distinguished between characters and "grapheme clusters"; I took that to mean that a character was a single grapheme rather than a cluster of them, but now I see that the writer meant that a grapheme is a cluster of code points.

So characters in Rust strings are approximately code points. I'm comfortable with that; my intent is that Molt should be an extension language for Rust, and so should adhere to Rust conventions. Working with a string should have the same results at the TCL level as it does at the Rust level. For example, Molt's notion of whitespace is the same as Rust's, and slightly different than Standard TCL's.

dbohdan commented 4 years ago

:tada: :tada: :tada:

dbohdan commented 4 years ago

my intent is that Molt should be an extension language for Rust, and so should adhere to Rust conventions. Working with a string should have the same results at the TCL level as it does at the Rust level. For example, Molt's notion of whitespace is the same as Rust's, and slightly different than Standard TCL's.

I think this is the approach that makes the most sense for an embedded scripting language (that doesn't implement a particular standard).