Open disnet opened 7 years ago
Now you're speaking my language! I actually started to ask you about this on Gitter a couple of weeks ago but was too tired to get my thoughts straight.
It would simplify implementation of the default readtable in some places (though I'll have to make some changes to https://github.com/sweet-js/sweet.js/blob/master/src/reader/utils.js#L234-L241) and maybe a few others. But I'm all for it!
Thanks for the invite btw! 😄
(though I'll have to make some changes to https://github.com/sweet-js/sweet.js/blob/master/src/reader/utils.js#L234-L241)
Funny story, I've already done that 😄Was yak shaving on it earlier this week, little more cleaning and I'll push it up for your review.
I actually think it might make sense to extract the reader+readtable to its own repo/package since it's generally useful beyond just Sweet.
I actually think it might make sense to extract the reader+readtable to its own repo/package since it's generally useful beyond just Sweet.
I was holding off suggesting this. It's the reason why I implemented Reader
separately from TokenReader
in the first place.
Reader
, Readtable
and CharStream
should all be extracted together.
Something that's still nagging me is how TokenReader::locationInfo
is separate from CharStream::sourceInfo
. I did this on purpose as line and column numbers don't have any "real" meaning in a character stream but it still feels icky.
I would also like to answer the questions I brought up in https://github.com/sweet-js/sweet.js/pull/626#issuecomment-274308895.
Thinking about it some more, getting rid of Syntax
would allow me to:
TokenReader::locationInfo
into CharStream
TokenReader::readToken
and TokenReader::readUntil
into read
TokenReader::context
(it's only being used when wrapping the tokens in Syntax
)This effectively removes the need for TokenReader
entirely. All that's left are the error functions and they can live in reader/utils.js
.
Instead of making those calls to this.readToken
in the actions we could just call read
recursively, passing in stream
(which holds all of the context we need for errors). Getting rid of the context
means the signature can be read(source: string | CharStream, ...rest: Array<any>)
. This is the same signature as Reader::read
which implies we could get rid of that too! Then reader.js
would just have function exports.
I don't want to make these changes now as it would hold up 3.0 release. But I think I will make a new repo for the readtable/reader/charstream stuff under my account if that's alright with you.
I forgot one thing (so far). Sweet's algorithm for regex literal detection relies on looking at the previous results. This wouldn't necessarily be the case for other projects. So each project that used the readtables would have to implement their own reader function that used Reader
.
But I think I will make a new repo for the readtable/reader/charstream stuff under my account if that's alright with you.
Sounds good!
regex literal detection relies on looking at the previous results.
That's what I was yak shaving on in particular. I pushed up the changes I was working on to the refactor/readtable-no-syntax. It's still in a half broken state but I think the regex prefix stuff is working. Those changes are here.
Could we make the reader put Token | List<Token>
into the prefix and actually return a SyntaxTerm
or whatever type the subclass wants? That way the regex stuff wouldn't need to change.
Thinking about Term
/Syntax
unification more I've become less happy about having methods like is*
and from*
on those objects. I think it causes more breakage for not much value. The methods were necessary previously because we didn't have a good way to get the functionality into a macro. Now that modules work better maybe we should remove all those methods and move to just module imports.
A strawman API example that is not well thought out:
import matchSyntax from 'sweet.js/match-syntax' for syntax;
import makeSyntax from 'sweet.js/make-syntax' for syntax;
syntax m = ctx => {
let param = ctx.next().value;
if (matchSyntax('NumericLiteral', param)) {
return makeSyntax('identifier', 'foo', param);
}
return makeSyntax('null', param);
}
m 1
I'm tempted to hold off on releasing 3.0 to include this merging plus API change otherwise we'd just have a 4.0 right on its heals that breaks everything again.
I'm a fan. This would play really well w/ https://github.com/sweet-js/sweet.js/issues/516#issuecomment-273511542.
I'm tempted to hold off on releasing 3.0 to include this merging plus API change otherwise we'd just have a 4.0 right on its heals that breaks everything again.
Agreed. I think we should revisit the list at https://github.com/sweet-js/sweet.js/pull/485#issue-98592882 to see what might have to break before we get there. A lot of those items could be minor version releases if we're careful.
Could we make the reader put Token | List
into the prefix and actually return a SyntaxTerm or whatever type the subclass wants? That way the regex stuff wouldn't need to change.
I was thinking that readToken
would function just as it currently does except that it doesn't wrap results from Reader::read
. Then we could do a later pass to wrap Token | List<Token>
(wrapInTerms
in macro-context
).
The downside of that approach is you need the additional wrapping pass. But if there's not a clean way to separate the tokenization from the wrapping we can go with that.
I think we might be talking past each other. What regex stuff are you thinking would not need changing if Token | List<Token>
were put into the prefix?
I can easily have readUntil
not wrap for the prefix and then wrap the returned results. I was looking to move it to reader/utils
anyway b/c it's used so infrequently. I think having it as part of the public API may have been a mistake.
What regex stuff are you thinking would not need changing if
Token | List<Token>
were put into the prefix?
Oh, all the regex prefix stuff would need to change. Currently the prefix is a List<Syntax>
but if we go with dropping Syntax
in favor of Term
then it needs to change anyway. That's what I did in the branch I pushed up the other day, it now works over the recursive type type TokenTree = Token | List<TokenTree>
.
What I was trying to say is that regardless of the type that the reader ultimately returns from read
(currently Syntax
, soon Term
, eventually maybe something else), the internals (like the regex prefix) should always work only on a TokenTree
.
A separate question is how a TokenTree
is wrapped into a Term
.
One approach is a readtable that holds two lists, one a list of Term
that is pushed to every time a token is read (wrapping the token immediately into a term) and the other a list of TokenTree
for the prefix.
The other approach is a readtable that only produces a List<TokenTree>
, which is then passed to wrapInTerms
after reading has completed.
The second approach is simplest but requires two passes (one to read, one to wrap).
I might not be understanding something though so let me know if this makes sense.
I don't want that second pass either. I tried a couple of weeks ago to use Immutable.Seq
until I realized that they don't autocache like Clojure's lazy-seq
.
How about I check out the no syntax branch and try to get it working using your first option?
Sounds good!
The first option isn't going to work unless we let 'tokens.js' know about terms. The prefix is often going to have lists of terms in it (recursive descent). isParens
and friends will break.
Actually, we have to do this anyway if those predicates are going to be useful outside of the reader.
That's overstating things a bit. isParens
et al. just have to know that the list contains objects w/ a value
property that is a Token
.
There's a PR in #634
Sweet internals geekery.
In #602 I tried to unify our syntax nesting structures by adding
RawSyntax
andRawDelimiter
to theTerm
subclasses and changingSyntax
to never wrap a delimiter.This was good because we didn't really need two trees hanging around but the current state of things is still messy and led to bugs like #625.
I think the right solution is to just completely remove
Syntax
. The only real job it's doing anymore is holding on to the lexical context (scopesets) which can just as easily be done onTerm
s.So the proposal is to:
Syntax
context
on allTerm
objectsaddScope
andremoveScope
onTerm
sfrom*
methods correctly onTerm
sSyntax::resolve
to be a standalone function that works overIdentifierExpression
andBindingIdentifier
termsSeem reasonable?