hhvm / hhast

Mutable AST library for Hack with linting and code migrations
MIT License
69 stars 42 forks source link

[ RFC ] What changes would you like to make to the hhast-lint.json format? #287

Closed lexidor closed 2 years ago

lexidor commented 4 years ago

I'd like to rethink the hhast-lint.json config settings, since there are some short comings. Here is just a short list of properties that I'd like to have, in no particular order. This is not an issue that is meant to be implemented; it is meant to start a discussion.

Updating HHAST should not cause more linters to be activated (causing failing travis builds).

When upgrading hhast versions, linters that have been written for that version are automatically turned on. This makes upgrading hhast (which you are sometimes required to do, because of hhvm AST changes) a more painful process than it should be.

There should be more granularity in the built-in linters than 'none' | 'default' | 'all'

Currently all, doesn't mean all linters. The only way to know that you are missing these linters is by digging into hhast and finding little gems like NoEmptyStatementsLinter or FinalOrAbstractClassLinter. They might have been left out of the all option (NON_DEFAULT_LINTERS in the source) by mistake.

The default option currently excludes 8 linters, but it is not clear what makes a linter non-default.

The none option is made for people who require a stable output from their linters and who do not want and upgrade to introduce more linters that their code will fail under. These people will list out every single linter they care about, one by one, and they will not be made aware of new and exiting linters.

A hhast-lint.json file that is valid in version B should be valid for version A, even if linters that are declared in the hhast-lint.json file are not present.

Currently, when you name a linter that does not exist, you get this error: Expected type 'Facebook\HHAST\BaseLinter', got type 'string' This stops HHAST dead in its tracks, since it is an uncaught exception. I'd like for HHAST to report that linter X could not be found and continue linting with the rest of the linters.

I would be nice to be able to refer to groups of linters, instead of individual linters.

I'd like to be able to define Lexidor's pedantic linter pack: 1.4, Fully autofixable linters: *, Code style linters: 0.8 and have that be recognized by hhast.

I would like to introduce a new hhast-lint.lock file. This would allow us to:

jjergus commented 4 years ago

Updating HHAST should not cause more linters to be activated (causing failing travis builds).

I like this suggestion. Perhaps we could tag each new linter with the HHAST version in which it was added (or really any incremental number, since the HHAST version might not be known at the time a linter is added), then people could specify max linter version in their config. That way people could update HHAST itself without causing new lint errors, then later bump the linter version number if/when they're ready.

That said, it's probably not too hard to update HHAST + immediately add any new linters to disabledLinters until they're ready to deal with the errors.

There should be more granularity in the built-in linters than 'none' | 'default' | 'all' I would be nice to be able to refer to groups of linters, instead of individual linters.

I'm not opposed to this but I can't imagine how we'd decide which linter goes in which group (like you said, right now it's unclear what should be default, if we have more groups it will become even more confusing).

I wonder if we should just have "all". I expect most people would have 1--2 linters they don't like but they are unlikely to be the same ones for everyone, so everyone'd just use disableLinters? (+ maybe the version number)

fredemmott commented 4 years ago

This discussion should be tied to (not merged with) #279 :)

fredemmott commented 4 years ago

Updating HHAST should not cause more linters to be activated (causing failing travis builds).

I strongly disagree with this for all; default, I need to think about some more - my initial stance was that we match HHVM versions, and this would hold us to a higher standard than HHVM and Hack, which doesn't feel appropriate.

One of the proposed alternatives to 'all' and 'default' could be better

it is not clear what makes a linter non-default.

Definitely; initially, it was "is something like this needed for FB projects", but it's drifted and pretty arbitrary now.

Currently all, doesn't mean all linters.

I believe this is currently always a bug; the exception I expect at some point is completely contradictory linters - e.g. "always use \ , even in XHP" and "Use : for XHP wherever possible"

There should be more granularity in the built-in linters than 'none' | 'default' | 'all'

Perhaps get rid of default, add all-x.yy ?

The none option is made for people who require a stable output from their linters

This is a valid use case for it, but FWIW, it's made for people who want to gradually start linting their codebase, moving to all or default over time.

I would be nice to be able to refer to groups of linters, instead of individual linters.

13 :)

That said, now the AST's moving again, I'm not sure how viable third-party repositories of linters are, and this impacts the 'default' vs 'all' discussion:

if we strongly disagree with the intent of a linter, but know some people really want it, it's well-written, should we accept or reject the pull request? Should it be included in whatever our "recommended" set is, whether that's "default" or "all-x.yy" ? Practicality of third-party repositories does change this, but the need for new codegen can be a pain.

I'm going to add another one of these:

Should HHAST be split up in to hhast-lib,hhast-tools, and hhast-linters projects?

There's a risk of that being substantially more maintenance work, but I think that's my gut reaction - thinking through the last few months, I don't think that would actually be true in practice - @jjergus ?

We could continue publishing an 'hhast' package, that would just depend on all 3, but contain no actual code

fredemmott commented 4 years ago

More specifically:

fredemmott commented 4 years ago

... and none of the version numbers - even majors - need to match.

fredemmott commented 4 years ago

In that world:

fredemmott commented 4 years ago

@xixixao also likely has some input here, and #279 more generally (and has several other issues that are related to #279)

fredemmott commented 4 years ago

So, regardless, we're going to end up with some concept of "we recommend this set/configuration for new projects if you don't want to whitelist specific ones"; it's basically the same problem whether that means:

I don't think there's a good answer here: linters are not for soundness, they are for enforcing stylistic choices, which are fundamentally subjective. Even the ones that exist pretty much to highlight common mistakes have a counterpoint of "(I know what I'm doing|we idiomatically use them in a safe way), and we like writing code that way".

The FB internal approach is "anyone can turn it on, and if there's too much push-back it gets disabled/limited", which won't work here.

If we split out the linters to a separate package under facebook/ , perhaps "this matches FB coding standards and would be an appropriate linter at FB" is the correct separator; problems are:

would be an appropriate linter at FB

This has non-style factors, such as how much 'good' code it complains about, and whether it's autofixable

jjergus commented 4 years ago

There's a risk of that being substantially more maintenance work, but I think that's my gut reaction - thinking through the last few months, I don't think that would actually be true in practice - @jjergus ?

There's definitely some extra maintenance work for each project right now (e.g. an AST change in HHVM requires new HHAST release + potentially a new release of anything else that depends on HHAST like definition-finder, which would now include hhast-tools and hhast-linter).

If we proceed with the plan to merge some projects into a "monorepo" that we've been discussing, that would make it easier (assuming we have good automation to publish a release of each project from the monorepo).

fredemmott commented 4 years ago

Fair - I"m hoping that "just update hhast-lib without enabling more linters" in downstream projects would offset that

lexidor commented 4 years ago

Splitting hhast up into linters, tooling, and core would be the right thing to do for projects that want to depend on hhast. Most of those libraries that rely on hhast want access to the AST, but not to the linters.

I am torn on the maintainability question. If we decide in favor of this, it becomes painful to add methods onto Node classes that you want to use in a new linter. ExampleQualifiedName::isFullyQualified() would require a release of hhast-lib, and hhast-linters would require that fresh hhast-lib version.

This discourages adding things like this into hhast-lib and may lead to people placing functions like this as free standing functions into hhast-linters instead to reduce dependency hell.

azjezz commented 4 years ago

One thing I would like to note, since HHAST liners are a "tool" ( not a library to use or build upon ), its okay to have internal BC-breaks, such as changing class names or architecture structure. however, this shouldn't affect projects that disable linters.

therefore, I suggest that each linter now implements a new method getName() and hhast recommend using that in the configuration files when referring to a linter instead of FQN.

fredemmott commented 4 years ago

Some form of name spacing is desired so that third party linters can be created independently of each other without risk of conflict with other linters, bundles or not.

Having this be distinct to the FQN feels like solving the same problem twice

For conciseness, it does support a configuration equivalent to “use namespace”

azjezz commented 4 years ago

Some form of namespacing is desired so that third party linters can be created independently of each other without the risk of conflict with other linters, bundles or not.

the linter name itself can be prefixed, e.g: facebook_async_gen_prefix, nuxed_camel_case_enum_members.

with official linters ( e.i ones shipped with hhast ), not using prefix, e.g: await_in_a_loop

azjezz commented 4 years ago

Also, it would be nice to have to ability to disable a linter within a linter.

e.g: CamelCaseEnumMemberssLinter::getConflicts returns a vector of linters classes that should be disabled ( e.g shout case enum members linter )

fredemmott commented 4 years ago

with official linters ( e.i ones shipped with hhast ), not using prefix, e.g: await_in_a_loop

The intent is for bundled linters to be less special than they currently are; unprefixed is done for suppression as the odds of collision on the same line/node are somewhat low

fredemmott commented 4 years ago

Also, it would be nice to have to ability to disable a linter within a linter.

Pretty strongly against this: I do not think users would expect installing a library to disable linters they currently have enabled.

azjezz commented 4 years ago

it is possible for hhast to inform the users that 2 linters conflicts with each other and the user can decide on which one to use ( hhast can update the configuration file itself, and not run until the conflict is resolved ).

making things harder for end-users is not a good thing.

and having third party libraries reimplement their own runner to provide a better DX, is also not the best idea.

fredemmott commented 4 years ago

Inversely, adding linters can not be fully automatic, and will need an explicit opt-in for security: installing a library should not lead to code execution without any extra steps.

This also applies to HHAST itself, which is why the vscode (and previously atom) plugins prompt the user before executing it.

azjezz commented 4 years ago

I do not think users would expect installing a library to disable linters they currently have enabled.

I don't think so.

when installing a set of linters that ensures enum members use camel case over shout case, I would expect it to disable the shout case linter.

fredemmott commented 4 years ago

Conflicts and a resolution flow would be OK, as long as it’s pure data (e.g in vendor/foo/bar/hhast-lint.json), no code execution or configuration changes until user says yes

fredemmott commented 4 years ago

I do not think users would expect installing a library to disable linters they currently have enabled.

I don't think so.

when installing a set of linters that ensures enum members use camel case over shout case, I would expect it to disable the shout case linter.

Say a library I make gets taken over by someone malicious, and they add a fake linter that’s actually a Trojan but says it’s a naming convention change linter. Code can not automatically tell the difference between this and a real one.

I would agree for user expectations when explicitly installing a conflicting linter, but that can’t be distinguished from malicious cases - or changing the project naming linter because of a non-linting libraries preference.

fredemmott commented 4 years ago

(While yes, updating a library does change the code you’re executing, opening an editor to review those changes does not automatically execute them - when HHAST’s involved, it does)

azjezz commented 4 years ago

Say a library I make gets taken over by someone malicious

in this case, being able to disable or enable linters should be the latest concern, libraries can define scripts to run once they are installed in composer.json, or can simply add code to a partial hack file:

<?hh // partial

namespace Foo;

class SomeLibrary {

}

function hack_it(): void { }

hack_it();

or even better, can replace your library's binary ( vendor/hhvm/hhast/bin/hhast ) with their own.


IMO, the best way to go about this is as follow:

$ composer update

// updates

[HHAST] The following linters conflict with each other:

- [1] `FooBarLinter` 
- [2] `BazQuxLinter`

Please choose which one to use: |

After the user has chosen, hhast should update the configuration file to fix the conflict once and for all.

fredemmott commented 4 years ago

in this case, being able to disable or enable linters should be the latest concern, libraries can define scripts to run once they are installed in composer.json

This feature will almost certainly not be supported in whatever we eventually replace composer with. A potential security issue elsewhere does not make knowingly introducing another one OK.

pseudomains

Only an issue of code from that file is executed. Pseudomains are also going away

conflicts

Again fine with this existing as long as it is purely data-based, with all code for resolving them being in HHAST itself - or at least HHAST prompting user before any third-party code is executed

fredemmott commented 4 years ago

Not tested right now, but at least as of 2018 composer only runs post install scripts for top level projects, and didn’t for dependencies for similar security concerns - https://github.com/composer/composer/issues/1193

azjezz commented 4 years ago

there's always a workaround.

it's possible to turn your stolen library into a composer plugin and hijack composer downloader ( https://github.com/symfony/flex/ ), or hijack composer autoloader ( https://github.com/ircmaxell/PhpGenerics ).

jjergus commented 4 years ago

Back to the original question, I'd add:

fredemmott commented 2 years ago

There should be more granularity in the built-in linters than 'none' | 'default' | 'all'

Coming back to here because of hh_client --lint (cc @atry)

'default' is currently pretty close to error + codesmell + some fb-style; I'm not sure if that's a good default, but if so, it should be explicit. 'error + codesmell' would be a more objective default, but less useful. I'm not sure what's best.

There should be additional style linters that are not part of fb-style, but it shouldn't be possible to enable 'all': style linters can reasonably have exactly opposite viewpoints/recommendations.

'all' should be removed: it exists because of the incorrect assumption that projects would actually want to enable all linters - or, more specifically, that FB projects would. This implies that HHAST shouldn't merge linters that FB projects don't want. This is my mistake in the original design, we should be more open.

hh_client --lint makes this a bit more pressing, as it reports stuff in all of these categories.

I also philosophically don't like the idea of a 'fb-style' built-in, but:

fredemmott commented 2 years ago

default etc: those could also be versioned (as discussed previously), optionally, and it would be nice to be able to specify different constraints. For example, I might want:

Atry commented 2 years ago

Why do we configure linters in a JSON file? Shall we allow the users to create their own functions to select lint rules, for example, a Hack file as the configuration, similar to PAC?

fredemmott commented 2 years ago

Sure, though that's a bit tangental; regardless of how they write the configuration, we should provide better groupings and versioning data.

Historically:

Specifically JSON over other formats like JSONC, YAML, TOML: HHVM has very few built-in data formats. There's serialize(), json, and INI files (if you want exactly PHP's semantics for INI files). JSON seems like the clear winner there; there also weren't really other viable options in the ecosystem.

azjezz commented 2 years ago

it's a bit awkward to design: for example, we probably don't want to ask users to extend a config class - as then they'll need to exclude their hhast config from production builds or make hhast a real dependency rather than a dev dependency

no need to, a neat solution would be is to introduce a new binary, call it "hhast-installer", executing this binary will create a new binary in your desired directory ( default to tools/hhast, or bin/hhast ) with the following content:

#!/usr/bin/hhvm
namespace Tools;

use namespace HHAST;

<<__EntryPoint>>
async function main(): Awaitable<noreturn> {
  $config = HHAST\Config::create()
    ->withFoo()
    ->withBar('bar')
  ;

  await HHAST\Application::run($config);

  exit(0);
}

users can then edit that newly created binary to customize HHAST, and execute it instead of vendor/bin/hhast.


Libraries would need to add this generated file to their .gitattribute as /tools/hhast export-ignore ( the installer can do this automatically if the current directory is a git repository, after asking the user permission ).

fredemmott commented 2 years ago

libraries would need to add this generated file to their .gitattribute as /tools/hhast export-ignore

This doesn't solve the problem for top-level projects (e.g. executables or websites); it would also need ignoring in .hhconfig, removing a lot of the safety

fredemmott commented 2 years ago

This doesn't solve the problem for top-level projects

On the other hand, the same's true for tests/ dev-depends on hacktest and fbexpect, and that's somethign we all live with.

azjezz commented 2 years ago

the same's true for tests/ dev-depends on hacktest

was gonna say that, it should be dealt with the same way people deal with tests/ and other files, strip them from production build.

Atry commented 2 years ago
  • it would either require unsafe dynamic calls to potentially undefined functions that we intend to make impossible in HHVM, or installing the library would have to introduce a type error due to a missing definition

I don't understand this. Linting is simply another type of tests that can be executed by the IDE continuously. If vendor/bin/hacktest is able to execute tests written in Hack, why cannot vendor/bin/hhast_lint execute lint configurations written in Hack?

azjezz commented 2 years ago

also, it would be nice to make input and output handle arguments for run functions, something like:

$input = IO\input_handle();
$output = IO\output_handle();

$application = new HHAST\Application($config);

await $application->run($input, $output);

This was it would be possible to test the output of hhast in hhast itself, using MemoryHandle as input/output.

fredemmott commented 2 years ago

I don't understand this. Linting is simply another type of tests that can be executed by the IDE continuously. If vendor/bin/hacktest is able to execute tests written in Hack, why cannot vendor/bin/hhast_lint execute lint configurations written in Hack?

if it's a generated entrypoint like @azjezz suggests, it's not an issue

Otherwise the difference would be whether it's "execute all lint configs you find" vs "there must be exactly one"

fredemmott commented 2 years ago

also, it would be nice to make input and output handle arguments for run functions

This is already supported if the LinterCLI is directly constructed instead of using the runAsync() helper

Atry commented 2 years ago

I don't understand this. Linting is simply another type of tests that can be executed by the IDE continuously. If vendor/bin/hacktest is able to execute tests written in Hack, why cannot vendor/bin/hhast_lint execute lint configurations written in Hack?

if it's a generated entrypoint like @azjezz suggests, it's not an issue

Otherwise the difference would be whether it's "execute all lint configs you find" vs "there must be exactly one"

For manual execution of vendor/bin/hhast_lint, I would propose to let it always require the user to specify the linter configuration, like what hacktest does. For IDE executed linters, it could prompt something like: "No linter configuration is found at tests/linters/LintConfig.hack. Do you want to create it? Yes/No"

azjezz commented 2 years ago

I would propose to let it always require the user to specify the linter configuration, like what hacktest does.

That IMHO, is annoying, i would rather run ./tools/hhast than ./vendor/bin/hhast -c config/hhast.hack or similar, if hhast would to go with the "hhast-installer" method, i think it would be a good idea to do the same for other tools such as hacktest ( hacktest-installer ), so it can be something like a standard.

fredemmott commented 2 years ago

Linting is simply another type of tests that can be executed by the IDE continuousl

I don't think this is the right model; I don't think you're concretely suggesting that lint configs extend HackTest, but IMO if they go in tests/ and we think of them like unit tests, they should.

However, they are not automatically-ran unit tests in IDE mode: HHAST is a full language server, a long-running process with request/response communication with the IDE. It's closer to hh_client

I also consider the need to pass a path to hacktest to be a TODO item that I've not got around to yet: it shouldn't be necessary, and there should be a project configuration with a single canonical definition of what does it mean for the project to have a clean test run - it shouldn't depend on each caller (human or script) passing the right arguments.

Similarly, I don't want different behavior for IDE and CLI users; these should generally behave identically, and also the same as CI (which is largely CLI, but with the arguments being fixed).

--

Also, I'm going to ask for github discussions to be enabled in this repository, then we need to split this up a bit :) I think there's three main questions, and we're trying to discuss them all in one thread:

lexidor commented 2 years ago

I'll withhold my input until discussions are enabled. This whole thing went by pretty fast and I can't quite follow along.

fredemmott commented 2 years ago
lexidor commented 2 years ago

Closing in favor of https://github.com/hhvm/hhast/discussions