Open danbroooks opened 5 years ago
One thing that I am noticing is that the notation:
king King Maybe
is not adding a relation, but an embedded record. You'd use KingId
to specify a foreign key relation.
Also, those fields are all Maybe
. Are we sure that it's the embedded entity, or the Maybe
causing the slowdown?
Good spot!
model-relations-5
with the types switched to KingId
etc:
Benchmark persistent-th-bench: RUNNING...
benchmarking mkPersist/From File/model-relations-5
time 15.23 ms (15.06 ms .. 15.39 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 15.50 ms (15.35 ms .. 15.76 ms)
std dev 492.6 μs (227.4 μs .. 764.3 μs)
variance introduced by outliers: 11% (moderately inflated)
then just KingId
as opposed to KingId Maybe
:
Benchmark persistent-th-bench: RUNNING...
benchmarking mkPersist/From File/model-relations-5
time 14.78 ms (14.58 ms .. 14.90 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 14.87 ms (14.75 ms .. 15.05 ms)
std dev 365.3 μs (202.3 μs .. 525.0 μs)
Then just King
, with no Maybe
:
Benchmark persistent-th-bench: RUNNING...
benchmarking mkPersist/From File/model-relations-5
time 137.4 ms (135.9 ms .. 139.5 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 139.5 ms (138.6 ms .. 140.3 ms)
std dev 1.304 ms (982.1 μs .. 1.851 ms)
variance introduced by outliers: 12% (moderately inflated)
And back to King Maybe
:
Benchmark persistent-th-bench: RUNNING...
benchmarking mkPersist/From File/model-relations-5
time 148.0 ms (144.6 ms .. 149.9 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 151.9 ms (150.2 ms .. 153.9 ms)
std dev 2.716 ms (2.095 ms .. 3.346 ms)
variance introduced by outliers: 12% (moderately inflated)
I am guessing this may be a bit of a facepalm, seems like things are much faster when pointing at the Id
rather than the type itself... which would make sense as when referring to the type itself, it is getting bigger, and it is also pointing to the other types, which themselves are increasing in size as well ?
When running model-relations-9
with this change, things are much quicker than what I posted initially :man_facepalming:
Benchmark persistent-th-bench: RUNNING...
benchmarking mkPersist/From File/model-relations-9
time 28.52 ms (27.18 ms .. 30.14 ms)
0.991 R² (0.984 R² .. 0.998 R²)
mean 28.66 ms (28.02 ms .. 29.26 ms)
std dev 1.375 ms (1.118 ms .. 1.728 ms)
variance introduced by outliers: 16% (moderately inflated)
(the time for this one was 359.2 ms
in my above post)
I'm going to edit the title to say ".. the more embedded entities are added." Thanks for re-running benchmarks :)
Is this still an issue do you think? I think the exponential effect might be due to how to the number of fields grows between each benchmark. For example in model-relations-1
each entity has a single embedded entity as a field (which in turn has 1 embedded entity), compare this to model-relations-9
where each entity has 9 embedded entities (which themselves have 9 embedded entities, or atleast 8 more fields), which is the cause of the exponential times, hence the facepalm. Is that making sense?
I guess we could check that by having
EmbedMe
name Text
EmbedCarrier
embed1 EmbedMe
embed2 EmbedMe
embedN EmbedMe
Yeah, given the benchmarks where model-relations-1
is:
King
field1 Text Maybe
field2 Text Maybe
field3 Text Maybe
field4 Text Maybe
field5 Text Maybe
queen Text Maybe
Queen
field1 Text Maybe
field2 Text Maybe
field3 Text Maybe
field4 Text Maybe
field5 Text Maybe
Jack
field1 Text Maybe
field2 Text Maybe
field3 Text Maybe
field4 Text Maybe
field5 Text Maybe
and model-relations-9
is:
King
field1 Text Maybe
field2 Text Maybe
field3 Text Maybe
field4 Text Maybe
field5 Text Maybe
queen Queen Maybe
queen2 Queen Maybe
queen3 Queen Maybe
queen4 Queen Maybe
queen5 Queen Maybe
queen6 Queen Maybe
queen7 Queen Maybe
queen8 Queen Maybe
queen9 Queen Maybe
Queen
field1 Text Maybe
field2 Text Maybe
field3 Text Maybe
field4 Text Maybe
field5 Text Maybe
Jack
field1 Text Maybe
field2 Text Maybe
field3 Text Maybe
field4 Text Maybe
field5 Text Maybe
and everything in between, here is the output:
Benchmark persistent-th-bench: RUNNING...
benchmarking mkPersist/From File/model-relations-1
time 3.537 ms (3.490 ms .. 3.583 ms)
0.998 R² (0.997 R² .. 0.999 R²)
mean 3.502 ms (3.468 ms .. 3.569 ms)
std dev 148.1 μs (84.53 μs .. 275.8 μs)
variance introduced by outliers: 24% (moderately inflated)
benchmarking mkPersist/From File/model-relations-2
time 5.008 ms (4.941 ms .. 5.070 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 5.025 ms (4.992 ms .. 5.075 ms)
std dev 132.7 μs (94.50 μs .. 214.0 μs)
variance introduced by outliers: 11% (moderately inflated)
benchmarking mkPersist/From File/model-relations-3
time 5.883 ms (5.770 ms .. 6.019 ms)
0.998 R² (0.997 R² .. 0.999 R²)
mean 5.847 ms (5.809 ms .. 5.890 ms)
std dev 123.7 μs (97.92 μs .. 168.2 μs)
benchmarking mkPersist/From File/model-relations-4
time 6.883 ms (6.826 ms .. 6.954 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 6.894 ms (6.842 ms .. 6.953 ms)
std dev 167.4 μs (134.9 μs .. 211.0 μs)
benchmarking mkPersist/From File/model-relations-5
time 7.821 ms (7.741 ms .. 7.890 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 7.875 ms (7.822 ms .. 7.946 ms)
std dev 184.2 μs (147.0 μs .. 229.3 μs)
benchmarking mkPersist/From File/model-relations-6
time 9.019 ms (8.897 ms .. 9.134 ms)
0.998 R² (0.996 R² .. 0.999 R²)
mean 9.018 ms (8.936 ms .. 9.163 ms)
std dev 283.3 μs (184.4 μs .. 438.3 μs)
variance introduced by outliers: 12% (moderately inflated)
benchmarking mkPersist/From File/model-relations-7
time 10.41 ms (10.29 ms .. 10.52 ms)
0.999 R² (0.997 R² .. 0.999 R²)
mean 10.43 ms (10.32 ms .. 10.55 ms)
std dev 297.1 μs (194.6 μs .. 422.0 μs)
benchmarking mkPersist/From File/model-relations-8
time 11.64 ms (11.55 ms .. 11.74 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 11.75 ms (11.68 ms .. 11.94 ms)
std dev 288.7 μs (159.7 μs .. 503.8 μs)
benchmarking mkPersist/From File/model-relations-9
time 13.87 ms (13.55 ms .. 14.14 ms)
0.998 R² (0.997 R² .. 0.999 R²)
mean 13.57 ms (13.45 ms .. 13.70 ms)
std dev 318.8 μs (250.4 μs .. 388.0 μs)
Benchmark persistent-th-bench: FINISH
Interesting - so it is the nesting behavior, then.
@parsonsmatt I think what is observed here is because when embedded entities are used, the fields for each entity are...er... embedded :smile: in each definition, and because each entity in these examples are growing by 1 each time, the number of total embedded fields is n*n
so i guess the TH will be growing in size at a similar rate... hence the slowdown.
Also I think the TH timings I have shared here are largely immaterial, I think it is just timing the TH generation, and not the actual compilation time (hence them actually being fairly low numbers? I've seen you say before that it is not the TH generation itself that is slow in persistent, but how long GHC takes to process all the code that the TH generates).
I'm not sure how long things were taking to compile when I was looking at this (and did not actually track any time
d builds, unfortunately :cry: ) but I do remember at the time thinking the numbers here did not line up with the actual time it was taking to compile this code (certainly not anything in ms
:joy: ), and that the slowness did seem to get rapidly worse once I was into the higher numbers of fields.
We can keep this issue open, but it may not be an issue in itself. There is nothing in particular here we can specifically address - if you add embedded entities, you're going to be generating lots of code, and this is going to be handing over more stuff for the compiler to deal with. Which is going to take longer.
Having said that, there may be some optimisation that can be done with embedded entities perhaps, where the generated code is not generating more stuff for GHC to deal with, but instead deals with a reference to the same value, similar to https://github.com/yesodweb/persistent/pull/1202
Right. I think the real fix is to dis-allow embedding entities. It's a bad pattern.
I have been doing some of my own benchmarking around the performance of
persistent-template
and I've noticed that the perf for the TH gets worse the more relations you add.You can see what I've been doing here https://github.com/yesodweb/persistent/compare/master...danbroooks:bench?expand=1#diff-4ce9d2d6083cd55da50aa29369fbc263
But specifically, I've added some model declarations that add more and more relations like so:
And for each one, the relations at the bottom get upped by one, up until the entities having 9 relations each.
So here is the output when it is run through the benchmarks:
As you can see the difference between each average time seems to increase in a non-linear fashion. This might be unavoidable though, and perhaps just the nature of how TH is working under the hood.
My first thought was there might be something in
persistent-template
that is implemented in an inefficient way, but I've not been able to find it... but it must be down to additional work it is doing because these fields are relations... if I switch the fields to allText
the time it takes to run is far far lower (though I think this is because it is generating less haskell, I think?), and doesnt increase in the same fashion. Here ismodel-relations-1
compared tomodel-relations-9
when all fields are set toText
:I wanted to share what I had found here, in case there was some optimisation that could be made here, or if anybody who is closer to the project would be able to shed some light on why what I have found is the way it is.