lint rule API v2 - Githubissues

samuelstroschein commented 5 days ago

Context

@martin.lysk1 and @loris.sigrist have been working on a new API for lint rules that is a breaking change and will foreseeably get a breaking change again. Plus, the API will be slow (maybe fast enough for MESDK-137 though!)

Why does the current and future lint API lead to recurring breaking changes?

The APIs are designed around a data structure. Any change in the data structure leads to a breaking change in the lint API.

For example, the current API lints on the messages primitive. The new proposal lints on message bundles. The result is that the entire linting system breaks (and is thrown away…). The data structure leaks into everything a lint rule does. From the id, to execution functions, to scheduling.

// Pseudocode that illustrates that the data strucuture is hardcoded
// into lint rules and breaks now by switching to a different primitive
type LintRule = {
- type: "messageLintRule"
+ type: "messageBundleLintRule",
   run: (args: {
- message: Message
+ node: MessageBundle
  settings: ProjectSettings & ExternalSettings
  report: (args: {
- messageId: Message["id"]
+ bundleId: MessageBundle["id"]
   languageTag: LanguageTag
   body: MessageLintReport["body"]
  }) => void
 }) => MaybePromise
}

Why are the APIs slow?

Disclaimer: It might be fast enough to fulfill MESDK-137.

Passing the top level node (MessageBundles) to a lint rule entails that a lint rule:

Has to re-execute if anything in a message bundle changed. Even if the lint rule only concerns itself with a variant. Ideally, a lint rule only re-executes if the variant of interest changes.
The inlang SDK needs to execute all lint rules and use post-processing filtering (similar to ESLint) instead of pre-process filtering.

if (messageBundleChanged){
  // lint rule always re-executes if a bundle changed, 
  // regardless of the lint rule only concerning itself with 
  // variants (as an example)
  const reports = await lintRule.run(bundle)
  // because lints happen on the root node, the linter
  // must execute the entire tree and filter afterwards
  // (that is expensive!)
  const filtered = reports.filter(postlintFiltering)
}

Potential solution

Have one lint rule API with different functions for different data primitives.

Aka rename run to lintBundle and remove the hardcoded data primitive type from the id bundleLintRule.{namespace}.{key} -> lintRule.{namespace}.{key}

(+) incrementally extendable for changes in and/or other data structures
(+) easier to communicate for users and lint rule authors
- "this is the lint rule API" not "this is the message bundle lint rule API, there is also the variant lint rule API, and oh the code lint rule API also exists"
(+) we can start with bundle lint rules, and then get more fine grained (performance) as needed (by adding functions for variants and messages)
(-) ?

What we need today:

type LintRule = {
  id: string
} & BundleLintRule

type BundleLintRule = {
  lintBundle: ({ bundle, settings, report }) => any
}

const mylintRule: LintRule = {
  id: "lintRule.missingTranslation",
  lintBundle: ({ bundle, settings }) => {
    report()
  }
}

We can extend the lint rule function in the future (if the need arises!)

type LintRule = {
  id: string
-} & BundleLintRule 
+} & BundleLintRule | MessageLintRule | VariantLintRule | CodeLintRule | DesignLintRule

+ type MessageLintRule = {
  lintMessage: ({ message, settings, report }) => any
}

+ type VariantLintRule = {
  lintVariant: ({ variant, settings, report }) => any
}

+ type CodeLintRule = {
  lintCode: ({ sourceFile, settings, report }) => any
}

// a variant lint rule only get's re-executed if a variant changes
const mylintRule2: LintRule = {
  id: "lintRule.inlang.ensureLowerCapsBrandName",
  // 💡 only triggers on update of a variant, not of a bundle or message
  lintVariant: ({ variant, settings }) => {
    if (variant)
    report()
  }
}

// a code lint rule 
const mylintRule2: LintRule = {
  id: "lintRule.inlang.ensureLowerCapsBrandName",
  // 💡 we can add functions as we need
  lintCode: ({ sourceFile, settings }) => {
    if (hasHardcodedMessage(sourceFile)){
      report()
    }
  }
}

Further notes:

I would keep one lint rule, one function to avoid situations like "i want to disable the code lint of lint rule X but keep the bundle lint"
the report function might or might not differ between the different lint functions. TBD

Considered alternatives

Expose the query (get) APIs.

(+) full flexibility
(-) performance footgun of slow queries by lint rule authors
(-) complicated re-triggering of lint rules via interception of queries

const exampleLintRule = {
  id: "blabla", 
  description: "Missing translations"
  run: ({ query, settings } => {
    // query, mix, and match whatever is needed to perform lints
    const messages = await query.message.get("*")
    // 💥 performance footgun if lint rules query all messages
    for (const message of messages){
      // 
      report(message.bundleId, "missing translation")
    }
    // ...
  }) 
}

samuelstroschein commented 5 days ago

@loris.sigrist @martin.lysk1

i don't want to slow down loadProject2. pushback if you think it's a distraction now.
it's basically the current API remove the data primitive from the lint rule id and name the run function according to the data primitive e.g. lintBundle, lintMessage, lintVariant, lintCode, lintDesign, etc.

martin-lysk commented 5 days ago

Great writeup - We will ~~adopt~~ consider this structure.
This describes what we briefly discussed with - we need lints for different levels (project, bundles, messages, variants) or even whole different entities (figma nodes)

One note:

I would keep one lint rule, one function to avoid situations like "i want to disable the code lint of lint rule X but keep the bundle lint"

If a message should always contain one run method (and i agree 100%) we could keep the structure and give each lint a type enum that discloses the expected parameter of the the function:


enum LintRuleType { 
  Bundle = "Bundle", 
  Message = "Message", 
  Variant = "Variant", 
  Code = "Code", 
  Design = "Design" 
}

// Base LintRule type with discriminated union for each rule type
type LintRule =
  | { id: string, type: LintRuleType.Bundle, run: (bundle: any, settings: any, report: any) => any }
  | { id: string, type: LintRuleType.Message, run: (message: any, settings: any, report: any) => any }
  | { id: string, type: LintRuleType.Variant, run: (variant: any, settings: any, report: any) => any }
  | { id: string, type: LintRuleType.Code, run: (sourceFile: any, settings: any, report: any) => any }
  | { id: string, type: LintRuleType.Design, run: (design: any, settings: any, report: any) => any }

// Example implementation of a lint rule
const exampleLintRule: LintRule = {
  id: "example-rule",
  type: LintRuleType.Code,
  run: (sourceFile, settings, report) => {
    // Implement the linting logic for code
    console.log("Linting code:", sourceFile);
  }
}

Another thought here - we could even use the id pattern to discriminate the lint type - but my type gymnastics end here @loris.sigrist
Any downsides i miss?

samuelstroschein commented 5 days ago

@martin.lysk1 your proposal is identical to mine. we both use union types except that you discriminate via type and overload via run.

thoughts:

i think discriminating via type is more natural than codeLint etc. 👍
you don't need an enum. your java background is showing :D enums in typescript are considered bad/dangerous. 👎

propsal:

get rid of the enum :D please confirm if you agree

type LintRule =
  | { id: string, type: "bundle", run: (bundle: any, settings: any, report: any) => any }
  | { id: string, type: "message", run: (message: any, settings: any, report: any) => any }
  | { id: string, type: "variant", run: (variant: any, settings: any, report: any) => any }
  | { id: string, type: "code", run: (sourceFile: any, settings: any, report: any) => any }
  | { id: string, type: "design", run: (design: any, settings: any, report: any) => any }

samuelstroschein commented 2 days ago

Call with @martin.lysk1 and @loris.sigrist agreements

1. which node is a lint report attached to?
-> lint reports only provide bundleId|messageId|variantId

sdk can populate missing ids for more efficient query
the ambiguity problem of "what is this lint report for?" is at least not a concern for lint authors anymore

locale as lint report property?
-> NO, use key

key generalizes across all lint rules. allowing arbitrary properties limits the possibilities for us to add properties as we need.

how to create, update, delete
-> Updating bundles yes. Create and delete in v2.
machine translate
-> just use inlang/rpc directly in the missing translation lint rule until machine translate plugin api exists

@martin.lysk1 @loris.sigrist please use 👍 to acknowledge that you read this and agree

opral / inlang-message-sdk

lint rule API v2 #104

Context

Why does the current and future lint API lead to recurring breaking changes?

Why are the APIs slow?

Potential solution

Considered alternatives