jack-pappas / ExtCore

An extended core library for F#.
Apache License 2.0
178 stars 32 forks source link

Move tagged collections/modules to separate assembly #43

Open jack-pappas opened 5 years ago

jack-pappas commented 5 years ago

ExtCore has some features requiring the library to be built using the F# proto-compiler (fsc-proto), most notably the 'tagged' collections types (such as TagMap) and their supporting modules. I've found these collections to be extremely useful in practice, so I don't want to get rid of them entirely; the requirement of using fsc-proto to build ExtCore is painful though, and makes maintenance much more difficult.

I was thinking of ways to more-easily use fsc-proto in CI builds (e.g. with a docker image), but upon reflection I think the most straightforward way forward will be to move the tagged collections into a separate project/assembly (e.g. ExtCore.TaggedCollections) which references ExtCore proper. That won't eliminate the fsc-proto requirement, but it does allow ExtCore to be updated separately -- we can work out a better solution (maybe) for building the ExtCore.TaggedCollections project later.

vasily-kirichenko commented 5 years ago

@jack-pappas Could you please explain why, say, TagMap is needed? The following code compiles OK and seems to do the same job as TagMap:

[<Measure>] type foo

let m: Map<int<foo>, int> = Map.empty

let m' = m |> Map.add 1<foo> 22
jack-pappas commented 5 years ago

Good question. The answer is "performance".

The ExtCore IntMap<'TValue> type is significantly faster than the standard F# Map<TKey, TValue> type, but it requires the key be an int; this is because IntMap<'TValue> is implemented as a Radix tree aka Patricia trie. Unfortunately, that requirement also prevents you from using F# unit-of-measure types to tag the keys if you'd like to (for ensuring the correctness of your code). At least, it prevents you from doing that without having to add int and Int32WithMeasure<foo> all throughout your code.

ExtCore's TagMap<'TTag, 'TValue> fixes this problem by adding a unit-of-measure type (the "tag") to the data structure itself. That's just an F# type alias for IntMap<'TValue> though, and like the unit-of-measure annotations it's a strictly-compile-time correctness check. This requires fsc-proto though because any ExtCore functions for operating on TagMap (or related types like TagBimap) are just calling through to the equivalent IntMap function; there's no equivalent of the int and Int32WithMeasure pair for non-primitive data structures though, so ExtCore uses the "inline IL" capability of fsc-proto to do "unsafe" coercions between IntMap<'TValue> and TagMap<'TTag, 'TValue>.

7sharp9 commented 5 years ago

Incidentally what is the performance difference?

jack-pappas commented 5 years ago

@7sharp9 Depends what you mean by "performance difference"; specifically, the difference matters on what operations you're using on the maps. I threw together a couple of quick benchmarks comparing FSharpMap, IntMap and Dictionary and committed them if you want to take a look. I ran the benchmarks on my machine, and the results look like this:

For the MapFindIntegerKey benchmark, which builds a map then measures how long it takes to lookup each of the keys (to measure lookup performance):

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.16299.492 (1709/FallCreatorsUpdate/Redstone3)
Intel Core i7-4790 CPU 3.60GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
Frequency=3515621 Hz, Resolution=284.4448 ns, Timer=TSC
  [Host]     : .NET Framework 4.6.2 (CLR 4.0.30319.42000), 64bit LegacyJIT-v4.7.2671.0 DEBUG
  DefaultJob : .NET Framework 4.6.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2671.0

     Method | Randomized | DictSize |       Mean |     Error |    StdDev | Scaled |
----------- |----------- |--------- |-----------:|----------:|----------:|-------:|
  FSharpMap |      False |      100 |   5.827 us | 0.0493 us | 0.0461 us |   1.00 |
     IntMap |      False |      100 |   7.329 us | 0.0296 us | 0.0277 us |   1.26 |
 Dictionary |      False |      100 |   1.551 us | 0.0155 us | 0.0145 us |   0.27 |
            |            |          |            |           |           |        |
  FSharpMap |      False |      500 |  41.400 us | 0.2624 us | 0.2454 us |   1.00 |
     IntMap |      False |      500 |  43.754 us | 0.2384 us | 0.2113 us |   1.06 |
 Dictionary |      False |      500 |   7.708 us | 0.0576 us | 0.0510 us |   0.19 |
            |            |          |            |           |           |        |
  FSharpMap |      False |     1000 |  87.256 us | 0.3612 us | 0.3202 us |   1.00 |
     IntMap |      False |     1000 |  90.572 us | 0.5071 us | 0.4743 us |   1.04 |
 Dictionary |      False |     1000 |  15.357 us | 0.1221 us | 0.1142 us |   0.18 |
            |            |          |            |           |           |        |
  FSharpMap |      False |     2000 | 184.345 us | 0.7170 us | 0.5988 us |   1.00 |
     IntMap |      False |     2000 | 191.050 us | 0.8077 us | 0.7160 us |   1.04 |
 Dictionary |      False |     2000 |  30.946 us | 0.2830 us | 0.2647 us |   0.17 |
            |            |          |            |           |           |        |
  FSharpMap |       True |      100 |   5.701 us | 0.0550 us | 0.0429 us |   1.00 |
     IntMap |       True |      100 |   7.443 us | 0.0534 us | 0.0499 us |   1.31 |
 Dictionary |       True |      100 |   1.873 us | 0.0125 us | 0.0117 us |   0.33 |
            |            |          |            |           |           |        |
  FSharpMap |       True |      500 |  43.590 us | 0.2918 us | 0.2587 us |   1.00 |
     IntMap |       True |      500 |  49.861 us | 0.2457 us | 0.2298 us |   1.14 |
 Dictionary |       True |      500 |   9.862 us | 0.0973 us | 0.0910 us |   0.23 |
            |            |          |            |           |           |        |
  FSharpMap |       True |     1000 |  98.570 us | 0.5709 us | 0.4768 us |   1.00 |
     IntMap |       True |     1000 | 109.344 us | 0.7270 us | 0.6800 us |   1.11 |
 Dictionary |       True |     1000 |  20.625 us | 0.1170 us | 0.1037 us |   0.21 |
            |            |          |            |           |           |        |
  FSharpMap |       True |     2000 | 218.100 us | 1.1195 us | 1.0472 us |   1.00 |
     IntMap |       True |     2000 | 237.469 us | 0.7695 us | 0.6821 us |   1.09 |
 Dictionary |       True |     2000 |  41.024 us | 0.2985 us | 0.2792 us |   0.19 |

That benchmark shows the lookup performance of IntMap is a little worse (5-15%) compared to FSharpMap.

The other benchmark, MapCreateIntegerKey, measures how long it takes to create an instance of each map/dictionary type by adding one key/value pair at a time:

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.16299.492 (1709/FallCreatorsUpdate/Redstone3)
Intel Core i7-4790 CPU 3.60GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
Frequency=3515621 Hz, Resolution=284.4448 ns, Timer=TSC
  [Host]     : .NET Framework 4.6.2 (CLR 4.0.30319.42000), 64bit LegacyJIT-v4.7.2671.0 DEBUG
  DefaultJob : .NET Framework 4.6.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2671.0

     Method | Randomized | DictSize |         Mean |     Error |    StdDev | Scaled |
----------- |----------- |--------- |-------------:|----------:|----------:|-------:|
  FSharpMap |      False |      100 |    33.032 us | 0.3453 us | 0.3230 us |   1.00 |
     IntMap |      False |      100 |     6.769 us | 0.0210 us | 0.0175 us |   0.20 |
 Dictionary |      False |      100 |     2.687 us | 0.0156 us | 0.0138 us |   0.08 |
            |            |          |              |           |           |        |
  FSharpMap |      False |      500 |   227.166 us | 1.0358 us | 0.9182 us |   1.00 |
     IntMap |      False |      500 |    47.085 us | 0.2806 us | 0.2488 us |   0.21 |
 Dictionary |      False |      500 |    12.614 us | 0.0920 us | 0.0860 us |   0.06 |
            |            |          |              |           |           |        |
  FSharpMap |      False |     1000 |   509.901 us | 3.0342 us | 2.6897 us |   1.00 |
     IntMap |      False |     1000 |   106.247 us | 0.8367 us | 0.7827 us |   0.21 |
 Dictionary |      False |     1000 |    26.040 us | 0.1559 us | 0.1382 us |   0.05 |
            |            |          |              |           |           |        |
  FSharpMap |      False |     2000 | 1,138.429 us | 9.3989 us | 7.3380 us |   1.00 |
     IntMap |      False |     2000 |   241.246 us | 1.3231 us | 1.2377 us |   0.21 |
 Dictionary |      False |     2000 |    75.185 us | 0.9646 us | 0.9023 us |   0.07 |
            |            |          |              |           |           |        |
  FSharpMap |       True |      100 |    30.315 us | 0.1962 us | 0.1835 us |   1.00 |
     IntMap |       True |      100 |    11.125 us | 0.1030 us | 0.0964 us |   0.37 |
 Dictionary |       True |      100 |     2.837 us | 0.0350 us | 0.0327 us |   0.09 |
            |            |          |              |           |           |        |
  FSharpMap |       True |      500 |   237.120 us | 1.0421 us | 0.9748 us |   1.00 |
     IntMap |       True |      500 |    87.351 us | 0.4072 us | 0.3809 us |   0.37 |
 Dictionary |       True |      500 |    15.487 us | 0.1738 us | 0.1625 us |   0.07 |
            |            |          |              |           |           |        |
  FSharpMap |       True |     1000 |   535.693 us | 5.2662 us | 4.6683 us |   1.00 |
     IntMap |       True |     1000 |   204.941 us | 1.1319 us | 1.0588 us |   0.38 |
 Dictionary |       True |     1000 |    32.807 us | 0.2061 us | 0.1827 us |   0.06 |
            |            |          |              |           |           |        |
  FSharpMap |       True |     2000 | 1,208.365 us | 9.8850 us | 9.2464 us |   1.00 |
     IntMap |       True |     2000 |   495.908 us | 7.3050 us | 6.8331 us |   0.41 |
 Dictionary |       True |     2000 |    93.447 us | 1.2080 us | 1.1300 us |   0.08 |

The results of this benchmark show the insert/add performance of IntMap is significantly better than FSharpMap, by a factor of 2-5x.

7sharp9 commented 5 years ago

Thanks @jack-pappas So if you profiled a piece of code and you found there were lots of element creation it would be worthwhile trying out an IntMap, and if there were a lot of lookup, switch to Dictionary. I could see the IntMap being useful in the type inference code I was doing as that had the construction of lots of small environments using Map.

jack-pappas commented 5 years ago

More or less. There's also the very useful HashMap, which is based on IntMap so I didn't include it in the results above. Unlike IntMap though, HashMap can use the same key types as FSharpMap so is a drop-in replacement except for the fact it doesn't maintain an ordering on the keys like FSharpMap does.