Removing of :root-source-info from env breaks previous merged compatibility fix

jpmonettas commented 2 weeks ago

Hi. I'm seeing FlowStorm stopped working with the latest shadow-cljs (2.28.9) and it is related to this commit(https://github.com/thheller/shadow-cljs/commit/0f3d91e3e3b76e75597c7153f94dbbaf6097cb1d) which was released in 2.28.9 breaking what was added previously by this PR.

I see your commit message contains :

change :root-source-info to be REPL only

otherwise can cause issues with caching when custom reader literals are used. CLJS itself also only seems to have it for the REPL.

so you have a reason for it.

The reason for the original PR was to allow analysis code down the road to always know what the top level source form is, which after this commit can only be done in a repl context.

Since cljs.analyzer/analyze* is recursive, do you know any other way of always accessing the root form we are analyzing? Or at least to know at cljs.analyzer/analyze* level if it is the root form what we are currently analyzing?

thheller commented 2 weeks ago

As far as I can tell for regular CLJS this is only set in a REPL context. I assumed it would be fine to copy that behavior.

I'm still uncertain what this is used for in the first place? Can you provide a bit more context what this is useful for in the first place?

I changed it because adding the root form ended up in the analyzer data and has the potential to break the cache for that namespace. In essence as reported in the #shadow-cljs clojurians slack:

(ns test)
(defn ^:export init []
  (println #inst "2022-02-02"))

leads to

Failed writing cache for test: java.lang.Exception: Not supported: class java.time.Instant

I decided to remove this data since reader literals have the potential to return pretty much everything, so it is not isolated to the #inst literal. Even if it doesn't break anything it really doesn't make sense to keep this data in the cache.

An alternate would be to just remove that data before writing the cache, but given that I have no clue how this is used I couldn't figure out if it would become a problem later when the cache is used?

jpmonettas commented 2 weeks ago

Sure. So the thing is that I work in a project called ClojureScriptStorm, which is a fork and is constantly tracking ClojureScript, that adds automatic instrumentation. Basically modifies some functions so it can be controlled to emit extra js to trace function calls, returns, expressions executions, etc.

One of the modified functions is analyze*, where I need to do a special thing only on the top level forms. So I need to differentiate between the analyze* call that is coming from shadow-cljs with a top level form, vs the recursive calls. Currently for this I'm doing :

(defn analyze* [env form name opts]
  (let [top-level-form? (= form (get-in env [:root-source-info :source-form]))
        ...]))

which now only works for forms pasted at the repl, but I need that info no matter if we are in a repl context or not.

Does it make sense?

An alternate would be to just remove that data before writing the cache

That would work for me also.

Apart from this particular situation, I think it's nice to have a mechanism in ClojureScript analysis to know what's the top level form we are currently analyzing no mater where we are in the analyze* recursion.

thheller commented 2 weeks ago

So, how does this work in regular CLJS? I only ever see a single assignment in cljs.repl, which is not used for regular compilation?

And the only goal is to be able to tell whether or not something is the top-level-form?, it is never actually used for any other purpose?

jpmonettas commented 2 weeks ago

So, how does this work in regular CLJS?

The thing with regular cljs is that when you call basic cljs main (just with -t "nodejs" some-script.cljs) :

cljs-main

it always goes through the cljs.repl/evaluate-form which adds this info :

cljs-evaluate-form

as you can see in the stack on the bottom right corner.

But anyways, I'm not worried about vanilla cljs, since most people these days I think uses shadow-cljs, and since shadow is calling analyze*, it would be great to have a way of knowing the original form we are analyzing not only when the form is pasted on the repl.

thheller commented 2 weeks ago

To be clear, me asking questions is not so much about trying to maintain parity with CLJS, but rather to understand the use case.

From my perspective a change was made, for which I didn't do that check, which then lead to another bug. Trying to avoid a repeat and/or a hacky fix. I kinda do not like setting a thing and then having to remember to unset it at a later point.

If the only use-case is determining top-level-form? I'd imagine that (zero? (:column (meta form))) to be just as reliable as a way to determine that?

jpmonettas commented 2 weeks ago

Trying to avoid a repeat and/or a hacky fix. I kinda do not like setting a thing and then having to remember to unset it at a later point.

I completely understand that. Adding and then remembering to remove later is pretty error prone.

I'd imagine that (zero? (:column (meta form))) to be just as reliable as a way to determine that?

That is interesting, I haven't thought of that. Kind of hacky since will fail on cases where the top level forms don't start on column 0 for whatever reason, but should be weird.

jpmonettas commented 2 weeks ago

Hmm the column trick doesn't work because of macroexpansion. You would get top-level-form? true on (defn foo [] ...) as well as it macroexpanded version (def foo (cljs.core/fn ([] ...))) which will have the same column but it is not a "top level form" in the sense that it is the outer form in the code.

thheller commented 1 week ago

Hmm indeed. I didn't consider macros that copy some/all metadata. That would have been my second idea, attaching some metadata to the form, but I guess thats out due to that. Let me see if I can figure out something, if not I'll just add the data back and remove it before the cache is written.

jpmonettas commented 1 week ago

I don't want you to hack anything on shadow-cljs, but I think having context in the analysis process of the current outer form being analyzed is a good thing to have. Maybe a dynamic var like you are already binding here but which contains the current form being worked on?

I know it is going to be weird, because you are going to be binding a var that is never used in shadow-cljs source, nor in the official ClojureScript compiler. But also harmless, and given that we currently can swap compilers (like the ClojureScripStorm case) it will give some context to whatever compiler is running.

thheller commented 1 week ago

:root-source-info is only problematic because it makes its way into the :cljs.analyzer/namespaces data, which I'm fairly certain is accidental in the first place, as it is only used for error messages during compilation. As far as I can tell this key is entirely useless after the form is analyzed, and no other form would ever access it.

Would it be ok to just introduce a new key into the analysis env? So, your code could just check

(= form (or (get-in env [:root-source-info :source-form])
            (:shadow.build/root-form env))

Any new key is fine really, since they won't be stored in the analyzer data/cache. That would fix the "add and then remove later" thing I'd like to avoid.

jpmonettas commented 1 week ago

Yeah that would work for sure, and I think it is better than the dyn var, since semantically I think that is what env is for, which is conveying context for analysis down the road.

thheller commented 1 week ago

:shadow.build/root-form should be available as of 2.28.10.

Yeah, *analyze-top* shouldn't have been a dynamic var either. It is rather messy to detect and use properly. Just a key in env would have been better.

jpmonettas commented 1 week ago

Thanks a lot @thheller!

thheller / shadow-cljs

Removing of :root-source-info from env breaks previous merged compatibility fix #1188