Closed snowleopard closed 6 years ago
Just raw ideas I had concerning the benchmark:
Make the output pretty. For now, running cabal bench
produce a suite of number, it will be great to automatize the render we can see in doc/benchmarks
Define test-cases. I don't think all users will need the same "alga". Many of them will, for example, be concerned only by little graphs, and if they want to bench something, maybe providing a "little" option is a good idea (They don't want to benchmark something with 10^6 vertex). As you can replace little by medium and by big, and some users will maybe work only with tree, forest or things like that, I think providing "test-cases is a good idea, and thus allowing the user to benchmark only what they need.
Make a pretty code :) Actually, it is a mess to add a new bench (copy paste things, and replace only some chars...) (with some well-positioned map, I have reduced the number of line by 1/3)
The question is: where is the priority (if even it is listed here)...
@nobrakal I think the current benchmarks are such a mess that it might be better to start from scratch :)
I would therefore prioritise the second item in your list: let's figure out what we need to benchmark first.
What about starting from something classic, like a topological sort and depth-first search? Presumably, all graphs libraries have support for these algorithms.
To have a common starting point, we can assume that graphs are given simply by edgelists [(Int, Int)]
with vertices lying in the interval 1..10^k
for k
in range 1..6
. Does this sound sensible?
Ok, I understand !
I think staying classic for the beginning is a good idea. I will try to produce at least a list of common things to benchmark
Otherwise, representing a graph by a list of edges sounds very sensible ^^.
For inspiration, I searched other benchmarks, and only hash-graph provided one: https://github.com/patrickdoc/hash-graph/blob/master/benchmarks/benchmarks.md
I don't know if this is simply possible, but I was thinking that since different libraries are representing graphs in different ways, maybe some of them are more space-efficient than others. If this is the case, it will be great to see how many space a single path, or a clique of 10^6 elements take in the memory.
@nobrakal That's a very good start.
I suggest not to include anything Alga-specific into the first version, so let's drop size
for example.
Also, it may be best to avoid doing things randomly, at least initially, as it could make it difficult to interpret results. Instead I suggest to be very specific about each benchmark, for example:
Benchmark mesh-n-by-n-diagonal
:
edgeList $ fmap index $ mesh [1..n] [1..n]
, where index (i, j) = (i - 1) * n + j
.[index (i, i) | i <- [1..n]]
.[0, n * n + 1]
.[(index (i - 1, i), index (i, i)) | i <- [2..n]]
.[(index (i - 1, i - 1), index (i, i)) | i <- [2..n]]
.The above patterns look sufficient for all benchmarks in your list, but I might have missed something.
P.S.: Benchmarking memory would be great!
Hi,
You are right on both things, lets benchmark things we can compare with others !
Your example seems pretty clear to me, but I have a question: The implementation will not use the edgeList from you example, but only the idea behind it and try to access things directly on the mesh structure, right? I don't understand how we could use this edgeList otherwise, so maybe this is not a very wise question... But I prefer to ask!
About the memory usage, there is https://hackage.haskell.org/package/ghc-datasize-0.2.0 (a little bit old), http://hackage.haskell.org/package/hp2any-core-0.11.2, and more prosaically, we can use ghc directly through https://downloads.haskell.org/~ghc/7.2.1/docs/html/users_guide/prof-heap.html
The implementation will not use the edgeList from you example, but only the idea behind it and try to access things directly on the mesh structure, right?
@nobrakal I would actually prefer that we use exactly this definition to generate input for this particular benchmark, because it's much more transparent and terse than manipulating lists of pairs of integers.
Can we force GHC to evaluate it fully into a list of pairs of integers before passing it to benchmarks?
Ok, I see: it is to generate input. We will have to convert it back to a list of pairs of integer to benchmark with something like:
indexInverse u n= (u `div` n +1, u `mod` n)
In that case, we effectively absolutely need to force the evaluation before passing it to benchmark.
Some well-positonned map can do the trick (mapping over list comprehensions will, I think, force the evaluation of the arguments...). Otherwise, the $!
or seq
can always be used to force evaluation.
We will have to convert it back to a list of pairs of integer to benchmark with something like
No, we don't! The expression above already includes edgeList $ fmap index
in it, so it will produce exactly the list of pairs of integers in the range [1..n]
, easy to consume by any graph library. All access patterns are also already translated to vertices [1..n]
.
@snowleopard Are we discussing to benchmark alga or graph libraries in general ? I thought the goal was here only to benchmark alga. If I am right, the benchmarking tool will need in fine, an alga structure to benchmark, or I am misunderstanding something ? The process to "convert" an alga structure to a generic edgelist, and then re-convert it to an alga structure to make benchmarks on it (accessing, extracting...) seems pretty complicated to me for just benchmarking alga operations on alga graphs.
@nobrakal I see you point, you are right. I was jumping ahead a bit :-)
So, perhaps this is what we could do in the case of mesh-n-by-n-diagonal
benchmark:
mesh [1..n] [1..n]
.[(i, i) | i <- [1..n]]
.[(0, 0), (n + 1, n + 1)]
.[((i - 1, i), (i, i)) | i <- [2..n]]
.[((i - 1, i - 1), (i, i)) | i <- [2..n]]
.Then in future, when we start working on adding benchmarks for other graph libraries, we will only need to provide the index
function, which in this case is index (i, j) = (i - 1) * n + j
.
The following standard conversion will be applied for all benchmarks (not just mesh-n-by-n-diagonal
):
edgeList . fmap index
.map index
.map (\(x, y) -> (index x, index y))
.Ok, I understand now what you meant ! Your example is now totally limpid. I will try during the week to make some drafts on my fork.
About the size, a more simple and clean package for benchmarking size in memory is weigh
@nobrakal Thanks, looks promising indeed!
So I started to think about the benchmarks.
The main idea is to write a function of like benchGraphStruct :: GraphStruct -> Benchmark
So one can write a main like
main = defaultMain $ map benchGraphStruct [structMesh, structClique...]
import qualified Algebra.Graph.Class as C
data GraphStruct = forall a b. C.Graph a => GraphStruct String (b -> a) (b -> String) [b] [b -> FuncToBench a] [FuncToBench a]
So a GraphStruct contains a name, something that produce a graph from an argument, something capable of displaying this argument (for understanding results), a list of effective arguments, and some functions to bench on this structure. The last two lists will be finally appended when mapping over the arguments list (because some functions to be tested on the graphs need graph arguments).
data FuncToBench a = forall b. GraphConsummer String (a -> b) | forall b c. FuncToBenchWithArgs String (b -> a -> c) (b -> String) [b]
FuncToBench is divided between two things, something that represent a function that consume a graph and produce something (typically vertexCount
), and functions that are needing others arguments and a graph to produce something (typically hasVertex
).
Note that utilization of Existential Quantification is the only way I found to represent functions with not-needed results.It is also allowing me to "hide" graph types, and thus constructing a consistent list like the one provided in the overview.
You can find example of use here: https://github.com/nobrakal/alga/blob/ddf59570e9529e34fcc812db8da9100c112f04b0/bench/Bench.hs
It is basically an enumeration of bgroup and map
using these definitions.
The main problem is the inclusion of the list of functions to tests in the GraphStructure. With Existential Quantification, problems comes, and I was forced to include the list of functions to be benchmarked into the GraphStruct. I don't know if I can use more polymorphism to reduce code length.
Here, if I try to bench the structure creation (something like clique [1..n]
) it produce always the same time for all n. I think whnf
don't totally evaluate it... Or alga is really impressive! A solution will be to make Graph an instance of NFData, so we can use nf
. Others functions are sometimes benchmarked to a constant time (like hasVertex
). I think the problem is linked.
@nobrakal Looks rather impressive! But might be a bit overengineered :)
I didn't really have much time to think this through to the last detail (I'm chasing ICFP deadline this week), but my feeling is that your abstractions can be simplified.
For GraphStruct
: maybe we shouldn't bother with graph/vertex polymorphism and just stick to Graph Int
as the initial graph description, allowing to remove the type parameter a
. If we also allow only one size parameter per benchmark N
, then we also don't need the type parameter b
.
For FuncBench
: Maybe it's better to just create a fixed number of constructors for the cases with no arguments, 1 argument, 2 arguments, etc.
Remember, your benchmarks need to be simple if you want the community to understand and believe them, even if this simplicity comes at the cost of code duplication.
Can you implement just mesh-n-by-n-diagonal
benchmark as a starting point, without any generalisation?
Hi,
Yeah, it might be a little be complicated ^^.
You are right, the type parameter b
is not very necessary in GraphStrcut. I think we can assume that we will only build a graph with a single Int argument.
Concerning the a
(so the type of vertices of a Graph to bench) is, I think, mandatory: A mesh
will produce Graph (a,a)
, a clique
a Graph a
and deBruijn
a Graph [a]
. We can of course convert to a list of edge (like your index
functions) with only Int
vertices. But, this will complicate many things:
Int ->(Graph a)
to generate only the graph to benchmark and a (Graph a) -> (Graph Int)
to convert it to a "standard" Graph Int
, thus preserving the generic type a
.Here, type polymorphism can handle this problem for us, help to really reduce some code, and I think make it clearer (I agree with you that a good documentation will be mandatory). During benchmarks I don't need to force the type of a Graph, I only need it to be coherent with the functions I want to benchmark on it.
I think all the problem is to do generalisation. mesh-n-by-n alone is not very difficult like you pointed it out. Anyway, I will write as soon I have some time to, as you say, have a starting point (maybe jumping directly to a generalised thing wasen't very wise)
I'll just leave this here: https://github.com/haskell-perf and https://github.com/haskell-perf/checklist
@snowleopard I had exactly the same thought over the weekend. I modified the dictionaries repository to test some simple graph properties. So far my findings are as follows:
Good about this approach:
Bad about this approach:
Haskell-perf looks great ! (Its checklist is very interesting ^^).
@anfelor The idea is next to make an independent tool to benchmarks also other graphs libraries like described in this Haskell summer of Code idea. So maybe importing it in alga is not necessary...
@anfelor Awesome, thanks for giving haskell-perf
a try!
I agree with @nobrakal: we probably don't need to make graph benchmarking code a part of Alga -- it may be more discoverable/useful if it is a part of a general Haskell performance benchmarking suite.
I agree we don't need to import it into the main alga repository, but I believe that we (or at least I :)) need a (possibly separate) repository just for alga benchmarks. I have wondered repeatedly over the last couple of days how one implementation would compare against another, especially in connection with functional linear maps.
I would propose the following: Extend the graphs-benchmarks repository to include the other competing libraries to present an unbiased overview of the performance and clone it to create more specialised benchmarks testing more alga specific functions. For example, I believe that tests concerning clique
shouldn't be in the former but in the latter.
Let's say however, that this effort progresses relatively far before the start of the official GSoC and the first task becomes trivial. What would happen to the proposal? I would hate to see my application rejected on the grounds that the problem has essentially been solved. Have you also applied @nobrakal? I don't want to gain an unfair advantage over you by preempting all design decisions. What do you think @snowleopard?
@anfelor I think we could have a general graph benchmark suite in haskell-perf
and an Alga-specific benchmark suite in this repository, which would be tightly coupled with the development of Alga. This sounds like a good balance to me.
I think there are three proposals for this project: from @nobrakal, you and somebody else. From a brief look all three are very strong, so it would be a difficult choice :(
@snowleopard But the alga-specific benchmark should not be based on the haskell-perf
one? I would couple the alga-specific to the general (but not the other way around) to allow quick testing of standard use cases. And therefore keep the alga-specific in its own repository (at least until the license question is resolved, I have opened an issue for this https://github.com/haskell-perf/dictionaries/issues/10)
but not the other way around
Why not? It seems quite logical to take an Alga-specific benchmark suite, kept in this repository, and then extend it by adding more benchmarks of other graphs libraries.
I think having three repositories (Alga repo, Alga-specific benchmarks repo, and general benchmarks in haskell-perf
) would be just too confusing. Let's stick to two: one here, for tightly coupling it with Alga development, and one in haskell-perf
so that it can be easily discovered by the community.
@anfelor Your idea is great !
@snowleopard Maybe a git-submodule will fit here ? So your tests could easily extend the one in the haskell-perf repository.
Concerning GSoC, I don't know what to do ! As @anfelor says, if we make one of the principal objective before the program starts, I don't think the Haskell Committee will approve the project...
It seems quite logical to take an Alga-specific benchmark suite, kept in this repository, and then extend it by adding more benchmarks of other graphs libraries.
Because that could make the benchmarks biased to include graphs with a lot of structure like clique
s or trees (e.g. tasks for which alga is well suited). I would propose making the benchmarks that compare alga with other libraries completely independent of alga's main development. Some code sharing should be okay as long as they don't influence the range of benchmarks to much, though.
I would try to build it this way:
I think having three repositories (Alga repo, Alga-specific benchmarks repo, and general benchmarks in haskell-perf) would be just too confusing.
I agree, it could become that way. But I think one cannot embed a clone of another repository inside another? Or only as a git submodule as @nobrakal proposed?
Or maybe alga-perf could be independent of haskell-perf/graphs and just use the same infrastructure?
By the way, Chris Done just added an MIT license to haskell-perf/dictionaries, so we could merge anfelor/graph-benchmarks with this repository.
Or maybe alga-perf could be independent of haskell-perf/graphs and just use the same infrastructure?
@anfelor On my mind, if something is made on haskell-perf/graphs, allowing it to be simply extended by any of target libraries is a good idea. So one could simply clone/submodule the repo and compare traditional features with others, as well as adding "personnal-features" benchmark. It is easier to backport enhancements and bugfixes of the parent haskell-perf/graphs with this. This will also fits with any graph library, not only alga.
@anfelor @nobrakal You convinced me that haskell-perf/graphs
should be the main benchmark repo :)
How will the Alga-specific benchmark suite look like then? Will it have haskell-perf/graphs
as a submodule? Or maybe haskell-perf/graphs
could somehow be made into a Hackage library, on which we could simply depend?
I didn't thought about library, but I think it is a good idea. The general graph benchmarking suite can expose a standard API for making test (on generic list of edges like described above) and standard tests to be made. It will also provide an executable that execute these standard tests on target libraries, through the library provided.
It will reduce many code repetition, and allow a great modularity!
Also, having separated code for the "engine" behind the benchmarking suite, the interface between the engine and libraries, and the benchmark code is I think a very good choice.
@nobrakal Sounds like a good plan! Let's see how it works out.
Great !
I am wanting to do some things but as @anfelor says, I don't want to do too many things before the GSoC: it is getting unfair advantages by starting to work on the project, and if too many is done, there is also a poor chance for the proposal to be accepted...
@nobrakal Is that roughly what you had in mind?
class GraphImplementation g where
someDenseGraph :: g -- according to some specification
someSparseGraph :: g
testInsertEdge :: (Int -> g -> g) -> SingleTest
testRemoveEdge :: (Int -> g -> g) -> SingleTest
createReport :: [SingleTest] -> IO ()
I am not yet convinced that this is the right approach, but you should definitely build a prototype!
I don't think that prototypes will really be an issue as there is still a lot to be done afterwards (communication, reading up on other graph libraries, visualisation, etc..), but maybe, @snowleopard, you could write more about your experience with this? Did you have a similar situation in past GSoC projects?
The main reason why I didn't want to contribute to much to this issue, @nobrakal, is because I had already thought a bit about edge labelled graphs and felt this was your area. So if you want to work on this please do; it's only unfair to the third applicant but since he (or she?) has apparently not worked on the project at all so far I don't feel too guilty.
@nobrakal @anfelor I suggest you don't think about GSoC. Just work/collaborate as you would normally do on any open-source project. I don't think there is anything unfair about just being yourself :)
The discussions on GSoC have already started. Regardless of the outcome, I'd like to say that I am very grateful to you both for the contributions you have already made. I will get in touch with every candidate personally after the GSoC decision is made.
@snowleopard Ok then let's do some work then :smile: About the GSoC, it is our role to be grateful for your time !
@anfelor Here is what I thought (it is the opposite of your example: tests are not provided, but graphs are):
You can view the "very prototypal" version here (in a fork of alga, but it works alone).
You can read the idea in types:
-- Generic graph
type Edges = [(Int,Int)]
-- We want to pass the generic graph to create an according function to test
type ToFuncToBench a = Edges -> FuncToBench a
-- Type used to group both type of functions
data FuncToBench a = forall b. Consummer String (a -> b) | forall b c. FuncToBenchWithArgs String (a -> b -> c) (b -> String) [b]
-- An interface between ou generic graphs and others
class GraphImpl g where
mkGraph :: Edges -> g
-- Will transform ToFuncToBench in ToFuncToBench for benchFunc'
benchFunc :: GraphImpl g => ToFuncToBench g -> [Edges] -> [Benchmark]
-- Principal function
benchFunc' :: GraphImpl g => FuncToBench g -> Edges -> Benchmark
So, the user will create a set of ToFuncToBench, and will be allowed to test it over standard graph (of type Edges
).
For example, a path will look like
mkPath :: Int -> Edges
mkPath n = take n $ iterate ((\(x,y) -> (x+1,y+1)) :: (Int,Int) -> (Int,Int)) (0,1)
and a simple Consumer
isEmpty' :: ToFuncToBench (Graph Int)
isEmpty' = const $ Consummer "IsEmpty" isEmpty
At the end, you will only map over a set of ToFuncToBench
.
It looks my previous attempt, but without the graph creation part, and it is I think more clear.
Any thoughts ? Maybe type
aren't necessary, and the code can be more clear, but the idea is here.
Maybe I wrote something too generic, and this seems not to work for functions like hasEdge
, in need of two arguments.
So please read:
-- Type used to group different types of functions
data FuncToBench a = forall b. Consummer String (a -> b)
| forall b c. FuncWithOneArg String (b -> a -> c) (b -> String) [b]
| forall b c d. FuncWithTwoArgs String (b -> c -> a -> d) (b -> String) (c -> String) [(b,c)]
This allow something like:
edgesNotInPath :: Edges -> Edges
edgesNotInPath = map (\(x,y)-> (x-1,y+1))
pathHasEdge :: ToFuncToBench (Graph Int)
pathHasEdge = FuncWithTwoArgs "hasEdge" hasEdge show show . edgesNotInPath
Note that we can compose easily FuncWithTwoArgs "hasEdge" hasEdge show show
with anything else
Came across this PR to Data.Graph
: https://github.com/haskell/containers/pull/549. A good example of real-life graph benchmarking. Also see a discussion in the Libraries mailing list: https://mail.haskell.org/pipermail/libraries/2018-March/028689.html
Great to see a real-life benchmark!
On the PR, the benchmark is about graph creation, which is something very specific: the problem is much more with the NFData instance than with criterion and the need of doing many things almost identical without copying/pasting hundred of lines of code :smile: . Also, just tested, alga don't have any problem with fully evaluating a graph with circuit, meanwhile Data.Graph
seems to have problem with their NFData instance...
The use of uncurry
is perfect for functions that are needing arguments (so you can ignore my previous comment). I made a better example here: https://github.com/nobrakal/bench-graph
The use of
uncurry
is perfect for functions that are needing arguments
@nobrakal Indeed, this seems like a much simpler solution :)
meanwhile
Data.Graph
seems to have problem with theirNFData
instance...
By the way, I don't understand how/why Data.Graph
has a broken rnf
. It does not use anything like 'tying the knot` construction: it is simply an edgelist represented by an array. Could anyone explain this to me?
Hello, Note that in haskell/containers#549, the benchmark only works because the graph is acyclic (I first wrote an cyclic graph but there was an infinite loop, maybe linked to what @snowleopard says about rnf being broken in Data.Graph?)
Hey @OlivierSohn! Thanks, that's good to know. Could you test Data.Graph
on cyclic graphs by writing a custom function for fully evaluating it instead of relying on rnf
?
I won't have the time to do that but I'll add the suggestion in https://github.com/haskell/containers/issues/550. Thanks!
@nobrakal @anfelor By the way, today when I was waiting for what feels like an hour to build Alga because of a huge list of Criterion's dependencies I realised that it's not such a bad idea to have alga-benchmarks
as a separate repository ;-) I might split the current benchmark suite off into a separate repo during this weekend.
Haha, yes criterion does not come alone :) (On my archlinux, every single update of an haskell-* package come with a rebuilt criterion ^^ )
We can also move discussion over this issue on this new repo, because it is much more benchmark-related than alga-related !
I followed my path here: build the library, generic graphs example, and try it over alga. I think it produce understandable results with an (almost) clean syntax.
I am facing a little problem though: Criterion benchmark a function by applying an argument to it, and fully evaluate the result. Here are the results over hasEdge
:
benchmarking hasEdge/cycle1/(-1,2)
time 1.104 μs (1.100 μs .. 1.108 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.105 μs (1.102 μs .. 1.107 μs)
std dev 8.781 ns (7.687 ns .. 10.50 ns)
benchmarking hasEdge/cycle10/(-1,2)
time 5.855 μs (5.831 μs .. 5.879 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 5.869 μs (5.842 μs .. 5.914 μs)
std dev 116.3 ns (76.04 ns .. 206.5 ns)
variance introduced by outliers: 20% (moderately inflated)
benchmarking hasEdge/cycle100/(-1,2)
time 44.34 μs (44.28 μs .. 44.41 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 44.30 μs (44.16 μs .. 44.40 μs)
std dev 400.7 ns (311.8 ns .. 552.4 ns)
benchmarking hasEdge/cycle1000/(-1,2)
time 469.9 μs (468.5 μs .. 471.0 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 469.5 μs (468.4 μs .. 470.6 μs)
std dev 3.682 μs (3.115 μs .. 4.507 μs)
benchmarking hasEdge/cycle10000/(-1,2)
time 6.625 ms (6.579 ms .. 6.667 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 6.694 ms (6.656 ms .. 6.761 ms)
std dev 141.9 μs (76.95 μs .. 208.4 μs)
I don't know if it is very representative of hasEdge
. It certainly have to be compared with the full-evaluation of the graph concerned alone.
Maybe whnf
or whnfIO
can solve the problem?
https://hackage.haskell.org/package/criterion-1.4.0.0/docs/Criterion-Main.html#v:whnf
I am quoting http://www.serpentine.com/criterion/tutorial.html
Guidelines for thinking about when to use nf or whnf:
- If a result is a lazy structure (or a mix of strict and lazy, such as a balanced tree with lazy leaves), how much of it would a real-world caller use? You should be trying to evaluate as much of the result as a realistic consumer would. Blindly using nf could cause way too much unnecessary computation.
- If a result is something simple like an Int, you’re probably safe using whnf—but then again, there should be no additional cost to using nf in these cases.
I agree that using nf
of whnf
is a choice to be made, but I don't think that change the problem (in fact, i have just tested, and the output is almost the same thing).
UPDATE: We moved benchmarks to a separate repository to keep this package lightweight and fast to build, see some comments below.
Current benchmarks are a mess: https://github.com/snowleopard/alga/blob/master/doc/benchmarks.md