Closed hax closed 2 years ago
The same could be true of writing a Script that doesn’t rely on sloppy mode, and adding code that relies on it - the api of the script assumes a parsing goal, in a context (browser, node, etc), and any tool that transforms code is taking on the implicit risks that subverting the author’s intent brings. Web compatibility is only about deployed code on the web - not at all about the tools that generate it.
The same would be true of TLA, which will only work in a Module - bundlers simply have to know how to handle it - and the same is true of hashbang syntax - and nothing in fact changes there, some node scripts all can have a hashbang already.
The same could be true of writing a Script that doesn’t rely on sloppy mode, and adding code that relies on it... and any tool that transforms code is taking on the implicit risks that subverting the author’s intent brings
@ljharb I don't think adding code could compare to adding comments/whitespaces. The authors of transformers definitely should understand the potential semantic effects caused by code addition (for example, CJS to AMD transformer), but it's too terrible to force everyone understand the addition of comments/whitespace could cause the whole failure. And we already have tons of components/services rely on "comments/whitespace should never change semantic" which goes without saying. Not to mention causing fatal parsing failure.
Web compatibility is only about deployed code on the web - not at all about the tools that generate it.
Ok, we could use a much general word, like "don't break user expectation for many deployed services" --- no one would expect a simple cdn component which prepend a copyright comment for all served scripts would cause parsing failure for a "totally valid scripts as spec".
The same would be true of TLA, which will only work in a Module
Don't understand what's the relationship with TLA. Comments/whitespace never affect whether it's a module or script as I know.
bundlers simply have to know how to handle it
We also have concerns about TLA, but not because of the burden of bundlers. And it's not the right place to discuss TLA. We will post our feedback of TLA in TLA repo if our future WebFrontEnd TC meetings have conclusions about that.
nothing in fact changes there, some node scripts all can have a hashbang already
I already explained what's the difference with cli-only and putting it to spec in the first paragraph of "Possible solutions". Don't want to repeat it.
Comments and whitespace are a part of code, and all tools working with code needs to be fully aware of their rules.
If prepending a valid copyright comment on a script causes a parsing failure, then the parser is clearly broken - the solution is to find a better tool, not hamstring the language.
If prepending a valid copyright comment on a script causes a parsing failure, then the parser is clearly broken
@ljharb To some degree, I agree with you. But the point is, most tools/services I described before just use naive string prepending/appending/wrapping/concatenation and work fine in last 20 years, from ES3 era to ES2019, until this proposal. Even supporting hashbang is trivial in technical perspective, finding and upgrading all such tools in a big company will affect many teams and have big process cost. Not mention the cost in the scale of the whole ecosystem.
I already point out similar thought in other issues: we'd better not transfer the 100 people cost of committee to millions people cost of community.
the solution is to find a better tool, not hamstring the language.
I don't think treat #!
as some sort of comment is "hamstring the language", html comments in js is the precedent and hashbang is just comment in all other languages. Anyway, I would like to know if there is any concrete issue of treating #!
as comment.
I doubt the committee as a whole has a taste for adding another obscure commenting form; we already wish HTML comments didn't exist :-)
@ljharb
Well, I feel sometime we have to swallow such things and the good news is I see no one use <!--
for comments in real world, and I believe there will also be no one use #!
for normal comments.
Another similar precedents are we allow bom (which only make sense at the file start) as whitespace, allow PS/LS as newline, allow U+3000 and all other unicode whitespace... We don't expect anyone use these bizarrerie in their code but we still allow them and everything seems fine.
@bmeck
This is due to the difference of cli-only behavior with specced (all env) behavior which I described in the first paragraph of "possible solution".
An extreme example of the latter is:
#!/usr/bin/env node
if (typeof window === 'object') {
// normal code for browsers
} else {
console.log('This script can only be runned in the browser')
console.log('press enter to open browser, other key to exit')
// some magic code which detect the keyboard and open default browser 😂
}
Normal examples :
It seems like all the risks already exist in various forms and tools already need to account for much of this as code on public repositories do use #!
even prior to this proposal.
This needs to account for merging of copyrights / conflicts if the file already includes this information. I'm unclear if this is occurring how checking for #!
is significant.
I'm unclear how this is affected by #!
preceding them, it seems that debugging comments would still need a processor to understand them and such a processor could support any new syntax that could precede such comments.
CJS in Node and other environments already support #!
prior to this proposal as a preprocessing step as well as other tools in the ecosystem. Additionally #!
is not proposed to be valid inside of functions so CJS would still need to use preprocessing before handing off to JS spec parsing.
A variety of preprocessing needs to occur to ensure things like mixing strict/sloppy, Module/Script, conflicting exports, etc. do not occur to cause semantic problems/errors. Adding a #!
in the middle of a source text body invalidates the expected utility of a #!
and likewise is an error. Tools will need to handle these merge conflicts as appropriate and is out of scope of this proposal.
Wether intentional or accidental, adding code that invalidates the use case of #!
makes the code itself invalid to be used for the purposes of an interpreter directive. This is an error and prevents #!
from being used properly. As such, it shouldn't be considered safe to add text prior to #!
as it changes how a file might be evaluated similar to adding code prior to directive prologues.
From a historical perspective, prepending/appending comments, line breaks, whitespace to HTML/CSS/JS never cause a fatal parsing failure. Furthermore, prepending comments, line breaks, whitespace never change the parsing results and semantics of a JS script. Many of the previously mentioned examples of conversions potentially rely on this. So we believe, in some degree, this should be considered as a requirement for web compatibility too.
I suspect this can be re-addressed later if this proves a problem for web compatibility, but am skeptical of this being a compatibility concern as a variety of workarounds exist and I doubt many files will be written both for command line execution and embedding on the web.
That said, a follow on proposal to support #!
in any position is possible if someone wished to champion it; I do not see a need to include that at this time however.
Overall, I'm unclear on actual advantages to supporting the syntax everywhere. The reasons listed require preprocessing already to support various problems already and allowing #!
in arbitrary positions invalidates the intended purpose of #!
being supported. Additionally #!
not being at the beginning of source text can cause other problems potentially in the future due to grammar issues. It would limit any form of grammar where we have a valid grammar production in the future with a left hand side of the input stream #!
(e.g. x.#!y
).
I recommend validating outputs/inputs for the tools that are seeking to manipulate source text if tools are wishing to handle syntax forms that can cause conflicts.
We have somewhat of a precedent for this with UTF BOMs. From §11.1:
U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text <ZWNBSP> code points are treated as white space characters (see 11.2).
There, even though pushing the BOM out of the beginning of the file invalidates it for its intended use, we try to accommodate it. I'm not saying this was the right decision, just that it is how things are now.
@michaelficarra a BOM does affect the charset used, but I have concerns about affecting the command used to run a file far more than the charset.
It seems like all the risks already exist in various forms and tools already need to account for much of this as code on public repositories do use
#!
even prior to this proposal.
@bmeck See my previous comment: https://github.com/tc39/proposal-hashbang/issues/18#issuecomment-523193152
Prior to this proposal, the hashbang only exist in some cli-only scripts (normally under bin
or cli
directory of a package so it's easy to exclude them).
With this proposal, hashbang will become a valid structure in all env which will change the behavior of the programmers.
This needs to account for merging of copyrights / conflicts if the file already includes this information.
@bmeck
You are theoretically correct about the adding copyright feature, but it's not how things evolve in real world. I had a conversation about this with a guy (let name him X) who bring this case in our meeting, the real progress is:
I hope this story could help us to understand the truth: in most cases, software were developed to "good enough" as the balance of requirements and resources. Though we can always argue the use cases are not "good enough", note that this proposal will make them not only "not good enough" (can't merging copyrights/conflicts ) but "bad enough" (parse failure).
I'm unclear how this is affected by #! preceding them, it seems that debugging comments would still need a processor to understand them
@bmeck It's not about #!
preceding debugging info but debugging info preceding potential #!
(which cause parsing failure).
And the debugging info (as the original case submitter) is for person reading, like time cost, server ip/host name, cache hit, etc. Of coz it could also be consumed by tools.
Prior to this proposal, the hashbang only exist in some cli-only scripts (normally under bin or cli directory of a package so it's easy to exclude them).
#!
is valid in all locations not just CLI scripts for many tools, this seems evidence of this already being a non-issue in the ecosystem since while valid in other files people are not placing #!
there.
With this proposal, hashbang will become a valid structure in all env which will change the behavior of the programmers.
I am unclear on this. If it causes a parse error they could avoid using it or fix errors around it. Both seem doable. In addition, knowing the position lets tools convert #!
at the start of the file to //
or whatever solution they want if they cannot handle #!
being used by programs.
Per the CDN story, while I do understand time constraints on programmers I do not feel it is worth the problems mentioned in this and other threads being compromised on. I am skeptical still of this as in the ecosystem which does allow #!
generally this doesn't seem a problem and existing CDNs for JS in multiple environments handle this fine.
I remain in a position that if allowing #!
in arbitrary positions is desired it could/should be a separate proposal that addresses the concerns brought up in these issues around things like grammar, intention of code, and security. I am not convinced yet of this being an issue as we already have rollout of #!
from existing CJS supporting tooling (for web) and browser rollout without this being an issue.
@bmeck
I am not convinced yet of this being an issue as we already have rollout of
#!
from existing CJS supporting tooling (for web) and browser rollout without this being an issue.
As the original issue has explained, "in the short term, this risk is not significant and we may not get reports because... but in the long run... considering the scale of the entire ecosystem and industry, the cost of this risk in the future may be greater than we think."
I remain in a position that if allowing
#!
in arbitrary positions is desired it could/should be a separate proposal that addresses the concerns brought up in these issues around things like grammar, intention of code, and security.
Personally I think this issue should be solved in this proposal. As the comments and discussion of this issue, I don't see any disadvantage of making hashbang as html-comment-like, on the contrary it just match the behavior of all other languages under shell (allow hashbang anywhere --- because hashbang is just a valid comment), and I believe it could make the node.js commonjs implementation simpler (do not need any special code for it).
But I already put this in the agenda of our next WebFrontEnd TC meeting to discuss. If our TC accept it could be a separate follow-on proposal I will create it and close this issue.
Thank you!
Is this meeting a within-360 thing?
@ljharb Yes, I mean we (WebFrontEnd Branch of 360 Tech Committee) plan to discuss it in our internal meeting to decide our position:
proposal-hashbang-superset
which follow proposal-json-superset
? 😂) and submit it to future TC39 meetings@hax anything to report yet?
i noticed option-2 was on 2020-03-31 tc39-agenda but didn't get presented. any status?
cc @kyomic; does this issue still represent 360’s position?
i noticed option-2 was on 2020-03-31 tc39-agenda but didn't get presented. any status?
It was presented, but fsr it wasn't marked as such in the agenda (notes: https://github.com/tc39/notes/blob/794b0346646fd795a6454c543cc7f6d56ea0f5d4/meetings/2020-03/april-1.md#relax-hashbang-syntax-for-stage-1).
cc @zeldajay @yuanliang does this issue still represent 360’s position?
Is there anything else holding up progression to stage 4 besides this issue?
All major browsers and server-side engines have implemented this proposal and as I see it it's a purely "enhancement" type of change to the language, e.g. unbreak syntax that was previously breaking parsers.
The argument regarding concatenation of scripts seems weak to me because scripts containing hashbangs are already in widespread use today, so this supposed "breakage" is already happening today as parsers can not deal with lines containing hashbang in the middle of scripts, and neither are they supposed to.
I assume this is no longer the position of 360.
I've added this to the agenda for this month's meeting, and filed https://github.com/tc39/ecma262/pull/2816 on the spec; additional feedback can be given here or in plenary.
We discussed hashbang proposal in our last WebFrontEnd TC meeting, and we have a concern about current "only allow hashbang in the start of script/module" behavior.
(For those who can read chinese, the original document is here: https://github.com/75team/tc39/blob/master/issues/201907-hashbang-stage3.md )
Risk
There are many preprocessors, server-side include, code transformers, 3rd party services, etc. in real world which will prepend/append something to javascript source code, including but not limited to:
After the conversion,
#!
will not be at start anymore and cause SyntaxError.The risk here is that the original authors of the scripts or the developer using the scripts may not foresee that the script will be converted, even if they are aware of this possibility in theory, they often forget this in practice and cannot pay attention to it in time. And if someone add hashbang to a script, It is highly unlikely that this change will be considered as a breaking change and release a major version solely for a hashbang addition. In many cases, the local test env will not include the conversion steps. So it's hard for the users to notice the change and its consequences, the worst result is the SyntaxError caused by the conversion will cause failure in production.
On the other hand, conversion facilities and components are likely to be deployed and maintained by independent infrastructure teams or operations teams, and it is almost impossible to expect all of these teams to be aware of the risks posed by hashbang. Even if they are aware of this risk, they may not be able to respond in time due to various reasons, or simply attribute this to the edge case, or consider this to be the responsibility of the developers. In the best case, if the infrastructure team or the operation teams wants to actively avoid this problem, all the conversion facilities and components involved must be specially treated for hashbang, which may be unrealistic or the implementation cost may exceed expectation.
In the short term, this risk is not significant and we may not get reports because conversions of scripts are typically used for scripts in web pages, and all current usage of
#!
are limited to CLI entry scripts. But in the long run, there may be some scripts which apply to both the CLI and the browser environment, which may suffer from the introduction of hashbang. On the other hand, since the frequency of actual consequences due to this risk may be low (ie, considered as an edge case), it may cause the problem to be ignored intentionally and unintentionally. Considering the scale of the entire ecosystem and industry, the cost of this risk in the future may be greater than we think.(There is also a concern of developer experience similar to https://github.com/tc39/proposal-hashbang/issues/12#issuecomment-522995193 )
Similar precedents
@charset "xxx"
in CSS should be put at start of stylesheet. But this case have a big difference to hashbang.Normally,
@charset "xxx"
not at start will only invalidate the charset declaration, it does not necessarily cause the entire CSS parsing failure. Because UTF-16 encoding can't rely on@charset
, only ASCII-compatible encodings (UTF-8, ISO-8859-x, Shift-JIS, GBK, etc.) can be valid for@charset
, incorrect decoding of ASCII-compatible encodings only results in local mistakes (normally only cause bad property values, it's very rare selectors could contains non-ascii chars).Another similar case is the DOCTYPE declaration. In the old IE browsers, if there is anything before the DOCTYPE, including comments and XML declarations, the quirk mode will be triggered (ie the DOCTYPE declaration is invalid). However, in this case, it also does not cause the parsing failure, and the new browsers does not have this problem at all.
Rationale
From a historical perspective, prepending/appending comments, line breaks, whitespace to HTML/CSS/JS never cause a fatal parsing failure. Furthermore, prepending comments, line breaks, whitespace never change the parsing results and semantics of a JS script. Many of the previously mentioned examples of conversions potentially rely on this. So we believe, in some degree, this should be considered as a requirement for web compatibility too.
Possible solutions
In our meetings and further investigations in our community , a big voice is hashbang should not be put into ECMAScript spec and this proposal should be withdraw. Many think hashbang should be deal with by CLI (just keep current status) so that the programmers will be taught that it's only the feature of CLI and hashbang should only be used in entry files of CLI (normally only files under
node_modules/package/bin
ornode_modules/package/cli
directory in a package) and no one will expect normal JS files could have hashbang.If this proposal was still stage 2, we may advocate to withdraw this proposal. But we TC39 delegates of 360 understand this proposal already achieve stage 3 and browsers already land it, and Node.js already remove their old implementation and move to V8 implementation via language layer, so we will not advocate withdraw it in current status.
After some discussion, we suggest treat
#!
as comments. More specifically, we think it could use the similar grammar as-->
(SingleLineHTMLCloseComment).HTMLComment was introduced to JS for compatible to very old pattern of
The interesting fact is we not only allow
<!--
in the start of scripts (like current#!
), but treat<!--
behave like//
and also treat-->
behave like//
if-->
in the start of line (ignore all leading comments/whitespace). We feel the situation of#!
is very like-->
.We hope the champions of this proposal could consider our feedback carefully and bring it to further meetings if necessary, thank you!