Closed hostilefork closed 4 years ago
So @rgchris, I don't know where you stand on working with R3C (it would help if I knew!). But I tried to factor this out to make it easy for you to apply, if you chose to. I think by and large things are falling into place in a way consistent with what you have historically advocated for... e.g. FILE! and ISSUE! having internal slashes. (See the test...I'd really like to see more tests written to take advantage of Redbolisms this way!)
This is all related closely to the ISSUE! and CHAR! unification into one atomic UTF-8 optimized immutable type.
I want to be really certain about the # decaying to just space part. It gives an alternative to BLANK! as space:
>> unspaced ["controversial" _ "behavior"]
convtroversial behavior ; !!! we've wondered if this was right
>> unspaced ["replacement" # "behavior"]
replacement behavior ; this may be a more solid answer
It doesn't vanish quite as much as the BLANK! does for spacing. But there's some benefit in that, as it's kind of a blocky "negative space" space. But the fact that # would really be the space character in contexts it was found that were character related would make it less of a surprise:
>> second "a b"
== #
I'd whimsically toyed with the idea of contexts something like that returning BLANK! before, e.g. TO BLOCK! of TEXT!
>> to block! "ab cd"
== [#"a" #"b" _ #"c" #"d"]
It made some ideas better, but that's no match for:
>> to block! "ab cd"
== [#a #b # #c #d]
But making all dialects equate BLANK! with how NULL is handled doesn't feel right, as I really want this to work:
>> did parse [1 "a" 2] [integer! _ integer!]
== #[true]
This makes PARSE suitable for "deconstructing" things, and underscore is the traditionally-used "I don't care" of deconstruction.
Anyway, food for thought and hopefully adding up changes you will approve of.
It made some ideas better, but that's no match for:
>> to block! "ab cd" == [#a #b # #c #d]
There's definitely a lot of improvement there cutting down on the quotes. But as I push this through a bit further, I'm finding myself kind of less happy with the # - as - space concept. There's something off balance in the 1-charness of it.
Something I'm wondering is if caret standing alone could start one of these ISSUECHAR!s when escaping was involved:
>> to block! "ab cd"
== [#a #b ^_ #c #d]
That feels more "balanced", to have things like ^/
or ^(1C)
standing on their own, lexically. And you can really see were the spaces and newlines are (which will be the most common non-printables needing escaping by far)
But I think BLANK! should keep meaning space in DELIMIT, but only at a literal level:
https://forum.rebol.info/t/treat-blank-s-from-variables-or-evaluation-like-null/1348
It also makes the choice that # means an issue containing a space, not the empty issue. This offers a convenient alternative to expressing space, that is shorter than #" " or the word space.
And #{} would be the empty issue like <{}> the empty TAG?
And #{} would be the empty issue like <{}> the empty TAG?
I'm leaning that <{}> as the empty tag is probably the way to go. With tags being mutable there has to be a way to render their non-scannable-forms. We're talking about a world where <a tag >
will scan legally but < a tag>
will not, so if the latter is created by runtime means it needs to mold as <{ a tag}>
. The former would mold as is.
But I've mentioned that trying out # as space in practice wasn't as pleasing as I at first thought.
So # being an empty "issuechar!" does have applications, like what I'm currently calling "blackhole" usage:
https://forum.rebol.info/t/sending-values-into-space/1347
But it could also be a synonym for #"^(00)", or #^(00) or ^(00) ... e.g. an issuechar! that converts to codepoint 0. One benefit of that is that it becomes "toxic" in the process, so that you can't append it to strings. Which makes it a better outlier type for such purposes. And it can help with understanding why there's no such "issuechar!" as #"a^(00)b", because the only zero-bearing issuechar is an absolutely empty one.
Treating # as the empty issue. Perhaps it could ultimately double as the "codepoint 0" CHAR!, e.g. append &{AABB} # would produce &{AABB00} under the new rules. This would avoid creating a stringlike class that actually materially contained 0 bytes, which I'm trying to avoid.
Committed here:
https://github.com/metaeducation/ren-c/commit/4f3c86f256000cc55992d780d2dbd488962250b8
So to sum up the good news: more characters open up for ISSUE! (like / and .) as they are no longer wanted for putting ISSUE! in PATH! or TUPLE!. The "bad" news is that it will become an immutable class with optimized storage and no indexed position, so it will need to be converted to text for manipulations of that kind.
Not sure about # being the "space issue"... it seems anti-intuitive to me...
As I mentioned, that was abandoned. The usage of BLANK! as a space in things like DELIMIT is now standardized, with it only applying to "source level blanks":
https://forum.rebol.info/t/treat-blank-s-from-variables-or-evaluation-like-null/1348
Rebol2 and R3-Alpha break issues to become refinements at slash:
This changes it to follow the same load rules as FILE!, and include the slashes:
It also makes more characters in ISSUE! legal. This is designed to allow it evolve into the "cleaner" form of CHAR!, ultimately replacing the CHAR! datatype entirely.
While it looks sensible that
#;
would be an ISSUE!, the ; is a comment anywhere at this time. In order to help catch cases that would forget and use#;
like#:
and cause hard to find bugs of code getting commented out, this makes that case error.It also makes the choice that
#
means an issue containing a space, not the empty issue. This offers a convenient alternative to expressing space, that is shorter than #" " or the wordspace
.