tweag / nickel

Better configuration for less
https://nickel-lang.org/
MIT License
2.23k stars 85 forks source link

List equality in contracts can cause some big slowdowns when applied several times #1930

Open thufschmitt opened 1 month ago

thufschmitt commented 1 month ago

Describe the bug

When a contract sets a record field to a list, applying that contract several times can have a huge performance impact

To Reproduce

“Minimal” reproducer:

let complex_record =
  let inner =
    std.array.generate (fun x => { field = "field_%{std.to_string x}", value = { foo.bar = 1 } }) 10
    |> std.record.from_array
  in
  std.array.generate (fun x => { field = "field_%{std.to_string x}", value = inner }) 100
  |> std.record.from_array
in
let MySchema = {
  foo = [complex_record]
}
in

{
  one_contract =
    {}
      | MySchema,
  two_contracts =
    {}
      | MySchema
      | MySchema,
}

Evaluating two_contracts takes vastly longer than evaluating one_contract, although the only difference between them is that the MySchema contract is applied twice instead of once:

Command Mean [ms] Min [ms] Max [ms] Relative
nickel export project.ncl --field one_contract 211.2 ± 1.8 208.7 215.5 1.00
nickel export project.ncl --field two_contracts 1587.1 1587.1 1587.1 7.52

The time ratio can be made arbitrarily big by making the complex_record more complex.

Expected behavior

Applying the contract twice should ideally have a negligible impact, and at the very worse make the overall program run twice as slow.

Environment

Additional context

I have no idea whether lists are really the culprit here, but I couldn't reproduce the issue without involving one

yannham commented 1 month ago

Thanks for the report. Beside understanding why the behavior isn't linear, this example should also be the poster child for the contract deduplication optimization, so there's a problem here as well with the optimization not firing.

yannham commented 3 weeks ago

So, trying more than two contracts then seems to grow linearly. Unfortunately I think this is expected, because just one contract with a default value won't trigger anything else but just forcing the term and exporting it. However, with contracts, to ensure the impotency of merging, the interpreter needs to insert a std.contract.Equal foo foo check to be sure that the two arrays are the same. It's a contract defined in the stdlib for better error locality, so it's not free (as opposed to some optimized primop).

It should still be linear in the size of foo but it seems the constants are high... Indeed applying the contract three or four times seems to grow linearly again, with the same fixed cost of around +210ms per contract application.

Whether we can optimize this is one question. One possibility is to use physical equality when comparing elements, but this is unfortunately not sound, because equality fails on e.g. functions, so physical equality could work when actual full-fledged equality would fail with an error. Maybe it's still a reasonable trade-off, because I don't think physical equality still always return a "reasonable" answer (a form of strict reflexivity of the equality relation).

On the other hand, it doesn't explain why contract de-duplication doesn't fire, which is precisely useful to avoid those kind of stupid self-equality tests. Let me investigate that.

yannham commented 3 weeks ago

A bit more investigation. It turns out the contract deduplication doesn't fire, but it's to be expected actually, looking at this again. The idea behind contract dedup is to avoid having several merges combining the same contract many times. However, it's not expected that users are going to write foo | MyContract | MyContract in an obvious way, so when converting annotations to pending contracts, we don't check for duplication. Deduplication is performed only upon merging, and array concatenation, where we combine lazy contracts from multiple sources.

Now, if we split the multiple contract applications across records, they are deduplicated as expected. More on that later - I provide a full example with timing below.

std.contract.Equal is also indeed the culprit: without merging or anything, MyContract | std.contract.Equal MyContract is indeed taking up the additional 0.2ms. Looking at its implementation again, it does a lot of things in pure Nickel code - in fact it re-implements equality recursively but also compute the diffs of fields of records and so on - just to have very nice error messages. It obviously have a high cost. Here is my complete repro example for testing:

let complex_record =
  let inner =
    std.array.generate (fun x => { field = "field_%{std.to_string x}", value = { foo.bar = 1 } }) 10
    |> std.record.from_array
  in
  std.array.generate (fun x => { field = "field_%{std.to_string x}", value = inner }) 100
  |> std.record.from_array
in
let MySchema = {
  foo = [complex_record]
}
in

{
  one_contract =
    {}
      | MySchema,

  equality_only = ({} | MySchema) == ({} | MySchema),

  equal_contract = (MySchema | std.contract.Equal MySchema),

  two_contracts =
    {}
      | MySchema
      | MySchema,

  two_contracts_dedup | MySchema = {},

  three_contracts =
    {}
      | MySchema
      | MySchema
      | MySchema,

  three_contracts_dedup | MySchema = {},

  four_contracts =
    {}
      | MySchema
      | MySchema
      | MySchema
      | MySchema,
} & {
  two_contracts_dedup | MySchema = {},
  three_contracts_dedup | MySchema = {},
} & {
  three_contracts_dedup | MySchema = {},
}

Here are my timing for each output:

one_contract: 0,05s
two_contracts: 0,30s
three_contracts: 0,55s
four_contracts :0,76s

two_contracts_dedup: 0,01s
three_contracts_dedup: 0,05s

equality_only: 0,05s
equal_contract: 0,30s

Thinking of it, if pure equality is so fast, I don't see any drawback to having std.contract.Equal first checking equality using ==, as a fast path, and if that fails, then do all the mumbo jumbo to get a nice error messages (if the program fail anyway, it's less of an issue to take additional time to report a nice error message).

yannham commented 3 weeks ago

(EDIT: I've added the missing timings in the table)

yannham commented 3 weeks ago

I tried with the fast contract equality, and it indeed makes this example much faster (0.09s for four_contracts vs 0.76s). However, it comes at a cost as well: it makes the equality test eager. Here, because we force everything anyway, it's a win - however, in case where you might want to only extract part of the configuration, the trade-off is different.

In particular, I tried on the largest codebase that I have and it didn't made any difference (actually was a bit slower, but the numbers vary quite a bit, so it's hard to tell if this is an artifact of something unrelated to the change - but anyway, it didn't got much speed-up). I guess deduplication does it work correctly there, because array contracts used to be an issue on this codebase and deduplication brought a big improvement.

I'm not sure what to do here. I don't want to optimize for this example in particular. Maybe std.contract.Equal could still be improved, by trading a bit of good error reporting for speed. Or we can try to make deduplication smarter to avoid having to redo this work again and again.

thufschmitt commented 3 weeks ago

Thanks for the investigation @yannham !

For completeness (and to justify myself from “it's not expected that users are going to write foo | MyContract | MyContract in an obvious way” 😄 ), the original code was closer to { foo | MyContract, bar | MyContract = foo }. But I understand that getting contract elision on this pattern is also very hard.

it comes at a cost as well: it makes the equality test eager. Here, because we force everything anyway, it's a win - however, in case where you might want to only extract part of the configuration, the trade-off is different.

That's indeed an issue.

Maybe custom merge functions could solve this in the long run? Since I know here that I want to equal these lists eagerly I could write a merge function using the “fast equality”, and leave the lazy defaults otherwise

yannham commented 3 weeks ago

Maybe custom merge functions could solve this in the long run? Since I know here that I want to equal these lists eagerly I could write a merge function using the “fast equality”, and leave the lazy defaults otherwise

Oh, I didn't think about custom merge functions for such a use-case, but indeed, you could define an alternate one for arrays that would just use == instead.