jeff-hykin / better-cpp-syntax

💾 The source of VS Code's C++ syntax highlighting
GNU General Public License v3.0
155 stars 30 forks source link

Behavior of reTag, map!, and __deep_clone__ #487

Open jeff-hykin opened 4 years ago

jeff-hykin commented 4 years ago

The situation:

I think this is resulting in some subtle bugs where retagging doesn't seem to retag because of the includes, and if it did retag the includes, it would be mutating the originals.

This isn't going to be fun to fix, especially considering some includes are placeholders which will basically have to be deep cloned and reTag-ed at compile time, so I'm just going to post it here for dealing with later

maybe an easier first solution is for repeated patterns like zeroOrMore to use a new key instead of includes, and that new key issues a warning if it gets a placeholder

matter123 commented 4 years ago

Maybe I'm missing something, but what breaks if __deep_clone__ correctly clones includes?

jeff-hykin commented 4 years ago

If _deep_clone__ correctly clones includes I don't think that will break anything, although there might be some performance overhead its probably negligible. I just think that'll be hard to do because cloning includes: [ :thing ] will have to wait to resolve to a placeholder. Then a deep clone of the placeholder will need to be made, it will need to "remember" actions such as reTagging, then at compile time, resolve itself, deep clone whatever it resolved to, and then apply all the pending actions such as reTagging to the deep clone of the resolve. And that will need to be done recursively since the just-resolved placeholder might also have includes, all of which also need to be retagged

matter123 commented 4 years ago

Ok, so the issue is that when using placeholders, reTag doesn't have a chance to retag the placeholders as they are not resolved?

This problem should be independent of includes: as g[:a] = Pattern.new("a").then(g[:b]) when retagged will have the same issue.

Yeah having it remember is definitely the correct result. The simplest solution would be for reTag to detect a placeholder and wrapping with a DeferredRetagPattern, that acts like a placeholder, and when asked to resolve, it first resolves its wrapped pattern, then it reTags the wrapped pattern.

More generally it might be a good idea, with the token system using a placeholder like system, to better integrate placeholders into the patterns. A pattern could provide a method to describe if it is unresolved, and provide a method to resolve itself.

Example change for reTag:

     def reTag(args)
         __deep_clone__.map! do |s|
+            if s.is_a?(PlaceholderPattern) || s.is_a?(DeferredRetagPattern)
+                next new DeferredRetagPattern(pattern: s, args: args)
+            end
             # tags are keep unless `all: false` or `keep: false`, and append is not a string
             discard_tag = (args[:all] == false || args[:keep] == false)

Edit: actually DeferredRetagPattern should probably inherit from placeholder as it effectively needs to be treated as one.

when it gets resolved it just returns resolve(@args[:pattern]).reTag(args)

jeff-hykin commented 4 years ago

With the other complications, like unit tests on placeholders and challenges I'm having with the old token system, I'm starting to think everything should be deferred. Fundamentally everything has to wait till the end to get group #'s assigned anyways, right now were basically trying to support two methodologies (dynamic and compiled) simultaneously.

A pattern could provide a method to describe if it is unresolved

I think that would be a really good idea. Basically placeholders are anything that is default-unresolved. Any pattern is also unresolved if it contains an unresolved pattern in it's pattern chain or in its includes.

Some methods like to_tag could check self.resolved? (and return an error if false)

So basically a deferred-everything system would make everything const before (and maybe even during) the resolution phase. Just save the arguments of PatternBase.new(), have calling/running unit tests be deferred, deep clones are deferred (since the includes/pattern-chain could contain placeholders), chaining uses deep clone and is therefore deferred, reTag's also use deep clone, and are then deferred. Then the resolution phase can do everything in the correct order: explore pattern chains with DFS to find pure (placeholder-less) patterns, basically recursively making perfect deep clones of every pattern, keeping track of patterns that include themselves (at any level), then checking repeats/groups/references, then performing all unit tests and linting.

I think its going to be a lot of work whatever way its done. reTagging patterns that include themselves at multiple levels will be challenging logic

matter123 commented 4 years ago

keeping track of patterns that include themselves

The only way this gets resolved is with the recursive nature that the registry allows.

If a pattern includes itself there are 2 choices:

  1. The pattern is already named in the compiling grammar. (the include just becomes #name
  2. The pattern is anonymous or part of a bigger pattern. The pattern is extracted out into a registry entry, and the include is rewritten as #extracted-#{pattern.hash}
matter123 commented 4 years ago

So to formalize the proposal.

A Pattern is Deferred if a pattern is not capable of executing #to_tag, or #to_r.

BasePattern has three methods added to it.

Any method that manipulates a pattern should, when encountering a deferred pattern, wrap the pattern in a wrapper that executes the method when resolve is called. This may be able to be a generic wrapper.

Undecided:

jeff-hykin commented 4 years ago

This is looking like a log of work, so I'm thinking for a 1.0 release lets just pull out reTag, pull out word_cannot_be_any_of:, pull out any kind of token-collection system, fix edgecase bugs, improve the error messages, and finalize the names of everything. Compared to the normal tmLanguage, this library already does so much, and its been unreleased for over a year. Getting a weak version of it into other people's hands would be an order of magnitude improvement for them.

We can make sure everything in 1.0 will be future compatible with the proposal being formed here