Formal feedback from 360

hax commented 5 years ago

We discussed hashbang proposal in our last WebFrontEnd TC meeting, and we have a concern about current "only allow hashbang in the start of script/module" behavior.

(For those who can read chinese, the original document is here: https://github.com/75team/tc39/blob/master/issues/201907-hashbang-stage3.md )

Risk

There are many preprocessors, server-side include, code transformers, 3rd party services, etc. in real world which will prepend/append something to javascript source code, including but not limited to:

Adding copyrights and license at start
Providing server information for debugging by prepending comments
Function wrapping (eg. convert CommonJS to AMD/UMD)
Simple concat of source files
Not intentionally, but for a variety of reasons adding the newline character accidently

After the conversion, #! will not be at start anymore and cause SyntaxError.

The risk here is that the original authors of the scripts or the developer using the scripts may not foresee that the script will be converted, even if they are aware of this possibility in theory, they often forget this in practice and cannot pay attention to it in time. And if someone add hashbang to a script, It is highly unlikely that this change will be considered as a breaking change and release a major version solely for a hashbang addition. In many cases, the local test env will not include the conversion steps. So it's hard for the users to notice the change and its consequences, the worst result is the SyntaxError caused by the conversion will cause failure in production.

On the other hand, conversion facilities and components are likely to be deployed and maintained by independent infrastructure teams or operations teams, and it is almost impossible to expect all of these teams to be aware of the risks posed by hashbang. Even if they are aware of this risk, they may not be able to respond in time due to various reasons, or simply attribute this to the edge case, or consider this to be the responsibility of the developers. In the best case, if the infrastructure team or the operation teams wants to actively avoid this problem, all the conversion facilities and components involved must be specially treated for hashbang, which may be unrealistic or the implementation cost may exceed expectation.

In the short term, this risk is not significant and we may not get reports because conversions of scripts are typically used for scripts in web pages, and all current usage of #! are limited to CLI entry scripts. But in the long run, there may be some scripts which apply to both the CLI and the browser environment, which may suffer from the introduction of hashbang. On the other hand, since the frequency of actual consequences due to this risk may be low (ie, considered as an edge case), it may cause the problem to be ignored intentionally and unintentionally. Considering the scale of the entire ecosystem and industry, the cost of this risk in the future may be greater than we think.

(There is also a concern of developer experience similar to https://github.com/tc39/proposal-hashbang/issues/12#issuecomment-522995193 )

Similar precedents

@charset "xxx" in CSS should be put at start of stylesheet. But this case have a big difference to hashbang.

Normally, @charset "xxx" not at start will only invalidate the charset declaration, it does not necessarily cause the entire CSS parsing failure. Because UTF-16 encoding can't rely on @charset, only ASCII-compatible encodings (UTF-8, ISO-8859-x, Shift-JIS, GBK, etc.) can be valid for @charset, incorrect decoding of ASCII-compatible encodings only results in local mistakes (normally only cause bad property values, it's very rare selectors could contains non-ascii chars).

Another similar case is the DOCTYPE declaration. In the old IE browsers, if there is anything before the DOCTYPE, including comments and XML declarations, the quirk mode will be triggered (ie the DOCTYPE declaration is invalid). However, in this case, it also does not cause the parsing failure, and the new browsers does not have this problem at all.

Rationale

From a historical perspective, prepending/appending comments, line breaks, whitespace to HTML/CSS/JS never cause a fatal parsing failure. Furthermore, prepending comments, line breaks, whitespace never change the parsing results and semantics of a JS script. Many of the previously mentioned examples of conversions potentially rely on this. So we believe, in some degree, this should be considered as a requirement for web compatibility too.

Possible solutions

In our meetings and further investigations in our community , a big voice is hashbang should not be put into ECMAScript spec and this proposal should be withdraw. Many think hashbang should be deal with by CLI (just keep current status) so that the programmers will be taught that it's only the feature of CLI and hashbang should only be used in entry files of CLI (normally only files under node_modules/package/bin or node_modules/package/cli directory in a package) and no one will expect normal JS files could have hashbang.

If this proposal was still stage 2, we may advocate to withdraw this proposal. But we TC39 delegates of 360 understand this proposal already achieve stage 3 and browsers already land it, and Node.js already remove their old implementation and move to V8 implementation via language layer, so we will not advocate withdraw it in current status.

After some discussion, we suggest treat #! as comments. More specifically, we think it could use the similar grammar as --> (SingleLineHTMLCloseComment).

HTMLComment was introduced to JS for compatible to very old pattern of

<script><!--
alert('hello world!')
//--></script>

The interesting fact is we not only allow  behave like // if --> in the start of line (ignore all leading comments/whitespace). We feel the situation of #! is very like -->.

We hope the champions of this proposal could consider our feedback carefully and bring it to further meetings if necessary, thank you!

ljharb commented 5 years ago

The same could be true of writing a Script that doesn’t rely on sloppy mode, and adding code that relies on it - the api of the script assumes a parsing goal, in a context (browser, node, etc), and any tool that transforms code is taking on the implicit risks that subverting the author’s intent brings. Web compatibility is only about deployed code on the web - not at all about the tools that generate it.

The same would be true of TLA, which will only work in a Module - bundlers simply have to know how to handle it - and the same is true of hashbang syntax - and nothing in fact changes there, some node scripts all can have a hashbang already.

hax commented 5 years ago

The same could be true of writing a Script that doesn’t rely on sloppy mode, and adding code that relies on it... and any tool that transforms code is taking on the implicit risks that subverting the author’s intent brings

@ljharb I don't think adding code could compare to adding comments/whitespaces. The authors of transformers definitely should understand the potential semantic effects caused by code addition (for example, CJS to AMD transformer), but it's too terrible to force everyone understand the addition of comments/whitespace could cause the whole failure. And we already have tons of components/services rely on "comments/whitespace should never change semantic" which goes without saying. Not to mention causing fatal parsing failure.

Web compatibility is only about deployed code on the web - not at all about the tools that generate it.

Ok, we could use a much general word, like "don't break user expectation for many deployed services" --- no one would expect a simple cdn component which prepend a copyright comment for all served scripts would cause parsing failure for a "totally valid scripts as spec".

The same would be true of TLA, which will only work in a Module

Don't understand what's the relationship with TLA. Comments/whitespace never affect whether it's a module or script as I know.

bundlers simply have to know how to handle it

We also have concerns about TLA, but not because of the burden of bundlers. And it's not the right place to discuss TLA. We will post our feedback of TLA in TLA repo if our future WebFrontEnd TC meetings have conclusions about that.

nothing in fact changes there, some node scripts all can have a hashbang already

I already explained what's the difference with cli-only and putting it to spec in the first paragraph of "Possible solutions". Don't want to repeat it.

ljharb commented 5 years ago

Comments and whitespace are a part of code, and all tools working with code needs to be fully aware of their rules.

If prepending a valid copyright comment on a script causes a parsing failure, then the parser is clearly broken - the solution is to find a better tool, not hamstring the language.

hax commented 5 years ago

If prepending a valid copyright comment on a script causes a parsing failure, then the parser is clearly broken

@ljharb To some degree, I agree with you. But the point is, most tools/services I described before just use naive string prepending/appending/wrapping/concatenation and work fine in last 20 years, from ES3 era to ES2019, until this proposal. Even supporting hashbang is trivial in technical perspective, finding and upgrading all such tools in a big company will affect many teams and have big process cost. Not mention the cost in the scale of the whole ecosystem.

I already point out similar thought in other issues: we'd better not transfer the 100 people cost of committee to millions people cost of community.

the solution is to find a better tool, not hamstring the language.

I don't think treat #! as some sort of comment is "hamstring the language", html comments in js is the precedent and hashbang is just comment in all other languages. Anyway, I would like to know if there is any concrete issue of treating #! as comment.

ljharb commented 5 years ago

I doubt the committee as a whole has a taste for adding another obscure commenting form; we already wish HTML comments didn't exist :-)

hax commented 5 years ago

@ljharb

Well, I feel sometime we have to swallow such things and the good news is I see no one use <!-- for comments in real world, and I believe there will also be no one use #! for normal comments.

Another similar precedents are we allow bom (which only make sense at the file start) as whitespace, allow PS/LS as newline, allow U+3000 and all other unicode whitespace... We don't expect anyone use these bizarrerie in their code but we still allow them and everything seems fine.

hax commented 5 years ago

@bmeck

This is due to the difference of cli-only behavior with specced (all env) behavior which I described in the first paragraph of "possible solution".

If it's cli-only behavior, programmers only add hashbang for cli-only scripts
If it's all env behavior, programmers tend to add hashbang whenever the script can apply to cli env.

An extreme example of the latter is:

#!/usr/bin/env node
if (typeof window === 'object') {
  // normal code for browsers
} else {
  console.log('This script can only be runned in the browser')
  console.log('press enter to open browser, other key to exit')
  // some magic code which detect the keyboard and open default browser 😂
}

Normal examples :

Demo/test code which could be executed in both CLIs and browsers.
Single entry file or single bundle of a multiple env app. (Currently we have to create two entry files or two bundles.)

bmeck commented 5 years ago

It seems like all the risks already exist in various forms and tools already need to account for much of this as code on public repositories do use #! even prior to this proposal.

Adding copyrights and license at start

This needs to account for merging of copyrights / conflicts if the file already includes this information. I'm unclear if this is occurring how checking for #! is significant.

Providing server information for debugging by prepending comments

I'm unclear how this is affected by #! preceding them, it seems that debugging comments would still need a processor to understand them and such a processor could support any new syntax that could precede such comments.

Function wrapping (eg. convert CommonJS to AMD/UMD)

CJS in Node and other environments already support #! prior to this proposal as a preprocessing step as well as other tools in the ecosystem. Additionally #! is not proposed to be valid inside of functions so CJS would still need to use preprocessing before handing off to JS spec parsing.

Simple concat of source files

A variety of preprocessing needs to occur to ensure things like mixing strict/sloppy, Module/Script, conflicting exports, etc. do not occur to cause semantic problems/errors. Adding a #! in the middle of a source text body invalidates the expected utility of a #! and likewise is an error. Tools will need to handle these merge conflicts as appropriate and is out of scope of this proposal.

Not intentionally, but for a variety of reasons adding the newline character accidentally

Wether intentional or accidental, adding code that invalidates the use case of #! makes the code itself invalid to be used for the purposes of an interpreter directive. This is an error and prevents #! from being used properly. As such, it shouldn't be considered safe to add text prior to #! as it changes how a file might be evaluated similar to adding code prior to directive prologues.

From a historical perspective, prepending/appending comments, line breaks, whitespace to HTML/CSS/JS never cause a fatal parsing failure. Furthermore, prepending comments, line breaks, whitespace never change the parsing results and semantics of a JS script. Many of the previously mentioned examples of conversions potentially rely on this. So we believe, in some degree, this should be considered as a requirement for web compatibility too.

I suspect this can be re-addressed later if this proves a problem for web compatibility, but am skeptical of this being a compatibility concern as a variety of workarounds exist and I doubt many files will be written both for command line execution and embedding on the web.

That said, a follow on proposal to support #! in any position is possible if someone wished to champion it; I do not see a need to include that at this time however.

Overall, I'm unclear on actual advantages to supporting the syntax everywhere. The reasons listed require preprocessing already to support various problems already and allowing #! in arbitrary positions invalidates the intended purpose of #! being supported. Additionally #! not being at the beginning of source text can cause other problems potentially in the future due to grammar issues. It would limit any form of grammar where we have a valid grammar production in the future with a left hand side of the input stream #! (e.g. x.#!y).

I recommend validating outputs/inputs for the tools that are seeking to manipulate source text if tools are wishing to handle syntax forms that can cause conflicts.

michaelficarra commented 5 years ago

We have somewhat of a precedent for this with UTF BOMs. From §11.1:

U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text <ZWNBSP> code points are treated as white space characters (see 11.2).

There, even though pushing the BOM out of the beginning of the file invalidates it for its intended use, we try to accommodate it. I'm not saying this was the right decision, just that it is how things are now.

bmeck commented 5 years ago

@michaelficarra a BOM does affect the charset used, but I have concerns about affecting the command used to run a file far more than the charset.

hax commented 5 years ago

It seems like all the risks already exist in various forms and tools already need to account for much of this as code on public repositories do use #! even prior to this proposal.

@bmeck See my previous comment: https://github.com/tc39/proposal-hashbang/issues/18#issuecomment-523193152

Prior to this proposal, the hashbang only exist in some cli-only scripts (normally under bin or cli directory of a package so it's easy to exclude them).

With this proposal, hashbang will become a valid structure in all env which will change the behavior of the programmers.

hax commented 5 years ago

This needs to account for merging of copyrights / conflicts if the file already includes this information.

@bmeck

You are theoretically correct about the adding copyright feature, but it's not how things evolve in real world. I had a conversation about this with a guy (let name him X) who bring this case in our meeting, the real progress is:

X was assigned to maintain CDN for his dept, but he only have very limit time resource.
Some users from other dept started use the service, and asked for prepending copyright feature someday.
X used 1 hour to implement a feature which allow users specify the glob pattern in the config panel and prepending/appending text to the files matched. Note it was a simply string concat without any check, it was the duty of the users to ensure the file is still valid after prepending/appending.
The users were happy with that feature and do not have further requirement like merging of conflict.
After several months, this CDN was adopted by more departments and users, and X was given more time resource on it, they even had the process for decisions of features/bugs. To improve the security and avoid misuse, they decide prepending/appending text should be wrapped as comments. X wrote a one-time scripts to convert current text (comments) in the database to plain text. In this step he found one dept prepend/append code instead of comments, utilize the original feature to create a poor man's CommonJS to AMD wrapper. He informed the change plan to that dept, and that dept finally send a developer to work with X and implement a CommonJS to AMD/UMD wrapper as a official feature.
After several years, the dept of X was dismissed and there was no original project codes on their CDN, but many other dept rely on it, so the service was transfer to other dept, X did not maintain it anymore and there was actually no developer resource on it anymore.

I hope this story could help us to understand the truth: in most cases, software were developed to "good enough" as the balance of requirements and resources. Though we can always argue the use cases are not "good enough", note that this proposal will make them not only "not good enough" (can't merging copyrights/conflicts ) but "bad enough" (parse failure).

hax commented 5 years ago

I'm unclear how this is affected by #! preceding them, it seems that debugging comments would still need a processor to understand them

@bmeck It's not about #! preceding debugging info but debugging info preceding potential #! (which cause parsing failure).

And the debugging info (as the original case submitter) is for person reading, like time cost, server ip/host name, cache hit, etc. Of coz it could also be consumed by tools.

bmeck commented 5 years ago

Prior to this proposal, the hashbang only exist in some cli-only scripts (normally under bin or cli directory of a package so it's easy to exclude them).

#! is valid in all locations not just CLI scripts for many tools, this seems evidence of this already being a non-issue in the ecosystem since while valid in other files people are not placing #! there.

With this proposal, hashbang will become a valid structure in all env which will change the behavior of the programmers.

I am unclear on this. If it causes a parse error they could avoid using it or fix errors around it. Both seem doable. In addition, knowing the position lets tools convert #! at the start of the file to // or whatever solution they want if they cannot handle #! being used by programs.

Per the CDN story, while I do understand time constraints on programmers I do not feel it is worth the problems mentioned in this and other threads being compromised on. I am skeptical still of this as in the ecosystem which does allow #! generally this doesn't seem a problem and existing CDNs for JS in multiple environments handle this fine.

I remain in a position that if allowing #! in arbitrary positions is desired it could/should be a separate proposal that addresses the concerns brought up in these issues around things like grammar, intention of code, and security. I am not convinced yet of this being an issue as we already have rollout of #! from existing CJS supporting tooling (for web) and browser rollout without this being an issue.

hax commented 5 years ago

@bmeck

I am not convinced yet of this being an issue as we already have rollout of #! from existing CJS supporting tooling (for web) and browser rollout without this being an issue.

As the original issue has explained, "in the short term, this risk is not significant and we may not get reports because... but in the long run... considering the scale of the entire ecosystem and industry, the cost of this risk in the future may be greater than we think."

I remain in a position that if allowing #! in arbitrary positions is desired it could/should be a separate proposal that addresses the concerns brought up in these issues around things like grammar, intention of code, and security.

Personally I think this issue should be solved in this proposal. As the comments and discussion of this issue, I don't see any disadvantage of making hashbang as html-comment-like, on the contrary it just match the behavior of all other languages under shell (allow hashbang anywhere --- because hashbang is just a valid comment), and I believe it could make the node.js commonjs implementation simpler (do not need any special code for it).

But I already put this in the agenda of our next WebFrontEnd TC meeting to discuss. If our TC accept it could be a separate follow-on proposal I will create it and close this issue.

Thank you!

ljharb commented 5 years ago

Is this meeting a within-360 thing?

hax commented 5 years ago

@ljharb Yes, I mean we (WebFrontEnd Branch of 360 Tech Committee) plan to discuss it in our internal meeting to decide our position:

Option 1: we want this issue to be solved in this proposal (which means we will block this proposal to next stage if this issue is not solved)
Option 2: we will create a separate proposal to relax the hashbang be html-comment-like (maybe proposal-hashbang-superset which follow proposal-json-superset ? 😂) and submit it to future TC39 meetings
Option 3: let it go (theoretically possible, but very unlike to happen)

bmeck commented 4 years ago

@hax anything to report yet?

kaizhu256 commented 3 years ago

i noticed option-2 was on 2020-03-31 tc39-agenda but didn't get presented. any status?

ljharb commented 3 years ago

cc @kyomic; does this issue still represent 360’s position?

Slayer95 commented 2 years ago

i noticed option-2 was on 2020-03-31 tc39-agenda but didn't get presented. any status?

It was presented, but fsr it wasn't marked as such in the agenda (notes: https://github.com/tc39/notes/blob/794b0346646fd795a6454c543cc7f6d56ea0f5d4/meetings/2020-03/april-1.md#relax-hashbang-syntax-for-stage-1).

ljharb commented 2 years ago

cc @zeldajay @yuanliang does this issue still represent 360’s position?

silverwind commented 2 years ago

Is there anything else holding up progression to stage 4 besides this issue?

All major browsers and server-side engines have implemented this proposal and as I see it it's a purely "enhancement" type of change to the language, e.g. unbreak syntax that was previously breaking parsers.

The argument regarding concatenation of scripts seems weak to me because scripts containing hashbangs are already in widespread use today, so this supposed "breakage" is already happening today as parsers can not deal with lines containing hashbang in the middle of scripts, and neither are they supposed to.

ljharb commented 2 years ago

I assume this is no longer the position of 360.

I've added this to the agenda for this month's meeting, and filed https://github.com/tc39/ecma262/pull/2816 on the spec; additional feedback can be given here or in plenary.

tc39 / proposal-hashbang