snowleopard / alga

Algebraic graphs
MIT License
719 stars 68 forks source link

Move benchmarks to haskell-perf/graphs #14

Closed snowleopard closed 6 years ago

snowleopard commented 7 years ago

UPDATE: We moved benchmarks to a separate repository to keep this package lightweight and fast to build, see some comments below.


Current benchmarks are a mess: https://github.com/snowleopard/alga/blob/master/doc/benchmarks.md

nobrakal commented 6 years ago

Just raw ideas I had concerning the benchmark:

The question is: where is the priority (if even it is listed here)...

snowleopard commented 6 years ago

@nobrakal I think the current benchmarks are such a mess that it might be better to start from scratch :)

I would therefore prioritise the second item in your list: let's figure out what we need to benchmark first.

What about starting from something classic, like a topological sort and depth-first search? Presumably, all graphs libraries have support for these algorithms.

To have a common starting point, we can assume that graphs are given simply by edgelists [(Int, Int)] with vertices lying in the interval 1..10^k for k in range 1..6. Does this sound sensible?

nobrakal commented 6 years ago

Ok, I understand !

I think staying classic for the beginning is a good idea. I will try to produce at least a list of common things to benchmark

Otherwise, representing a graph by a list of edges sounds very sensible ^^.

nobrakal commented 6 years ago

Other benchmarks

For inspiration, I searched other benchmarks, and only hash-graph provided one: https://github.com/patrickdoc/hash-graph/blob/master/benchmarks/benchmarks.md

Benchmark ideas

Benchmark size in memory ?

I don't know if this is simply possible, but I was thinking that since different libraries are representing graphs in different ways, maybe some of them are more space-efficient than others. If this is the case, it will be great to see how many space a single path, or a clique of 10^6 elements take in the memory.

snowleopard commented 6 years ago

@nobrakal That's a very good start.

I suggest not to include anything Alga-specific into the first version, so let's drop size for example.

Also, it may be best to avoid doing things randomly, at least initially, as it could make it difficult to interpret results. Instead I suggest to be very specific about each benchmark, for example:

Benchmark mesh-n-by-n-diagonal:

The above patterns look sufficient for all benchmarks in your list, but I might have missed something.

P.S.: Benchmarking memory would be great!

nobrakal commented 6 years ago

Hi,

You are right on both things, lets benchmark things we can compare with others !

Your example seems pretty clear to me, but I have a question: The implementation will not use the edgeList from you example, but only the idea behind it and try to access things directly on the mesh structure, right? I don't understand how we could use this edgeList otherwise, so maybe this is not a very wise question... But I prefer to ask!

About the memory usage, there is https://hackage.haskell.org/package/ghc-datasize-0.2.0 (a little bit old), http://hackage.haskell.org/package/hp2any-core-0.11.2, and more prosaically, we can use ghc directly through https://downloads.haskell.org/~ghc/7.2.1/docs/html/users_guide/prof-heap.html

snowleopard commented 6 years ago

The implementation will not use the edgeList from you example, but only the idea behind it and try to access things directly on the mesh structure, right?

@nobrakal I would actually prefer that we use exactly this definition to generate input for this particular benchmark, because it's much more transparent and terse than manipulating lists of pairs of integers.

Can we force GHC to evaluate it fully into a list of pairs of integers before passing it to benchmarks?

nobrakal commented 6 years ago

Ok, I see: it is to generate input. We will have to convert it back to a list of pairs of integer to benchmark with something like:

indexInverse u n= (u `div` n +1, u `mod` n)

In that case, we effectively absolutely need to force the evaluation before passing it to benchmark. Some well-positonned map can do the trick (mapping over list comprehensions will, I think, force the evaluation of the arguments...). Otherwise, the $! or seq can always be used to force evaluation.

snowleopard commented 6 years ago

We will have to convert it back to a list of pairs of integer to benchmark with something like

No, we don't! The expression above already includes edgeList $ fmap index in it, so it will produce exactly the list of pairs of integers in the range [1..n], easy to consume by any graph library. All access patterns are also already translated to vertices [1..n].

nobrakal commented 6 years ago

@snowleopard Are we discussing to benchmark alga or graph libraries in general ? I thought the goal was here only to benchmark alga. If I am right, the benchmarking tool will need in fine, an alga structure to benchmark, or I am misunderstanding something ? The process to "convert" an alga structure to a generic edgelist, and then re-convert it to an alga structure to make benchmarks on it (accessing, extracting...) seems pretty complicated to me for just benchmarking alga operations on alga graphs.

snowleopard commented 6 years ago

@nobrakal I see you point, you are right. I was jumping ahead a bit :-)

So, perhaps this is what we could do in the case of mesh-n-by-n-diagonal benchmark:

Then in future, when we start working on adding benchmarks for other graph libraries, we will only need to provide the index function, which in this case is index (i, j) = (i - 1) * n + j.

The following standard conversion will be applied for all benchmarks (not just mesh-n-by-n-diagonal):

nobrakal commented 6 years ago

Ok, I understand now what you meant ! Your example is now totally limpid. I will try during the week to make some drafts on my fork.

nobrakal commented 6 years ago

About the size, a more simple and clean package for benchmarking size in memory is weigh

snowleopard commented 6 years ago

@nobrakal Thanks, looks promising indeed!

nobrakal commented 6 years ago

So I started to think about the benchmarks.

Overview

The main idea is to write a function of like benchGraphStruct :: GraphStruct -> Benchmark So one can write a main like

main = defaultMain $ map benchGraphStruct [structMesh, structClique...]

Data Structures

So a GraphStruct contains a name, something that produce a graph from an argument, something capable of displaying this argument (for understanding results), a list of effective arguments, and some functions to bench on this structure. The last two lists will be finally appended when mapping over the arguments list (because some functions to be tested on the graphs need graph arguments).

FuncToBench is divided between two things, something that represent a function that consume a graph and produce something (typically vertexCount), and functions that are needing others arguments and a graph to produce something (typically hasVertex).

Note that utilization of Existential Quantification is the only way I found to represent functions with not-needed results.It is also allowing me to "hide" graph types, and thus constructing a consistent list like the one provided in the overview.

Action

You can find example of use here: https://github.com/nobrakal/alga/blob/ddf59570e9529e34fcc812db8da9100c112f04b0/bench/Bench.hs

It is basically an enumeration of bgroup and map using these definitions.

Problems

Data structures

The main problem is the inclusion of the list of functions to tests in the GraphStructure. With Existential Quantification, problems comes, and I was forced to include the list of functions to be benchmarked into the GraphStruct. I don't know if I can use more polymorphism to reduce code length.

Criterion

Here, if I try to bench the structure creation (something like clique [1..n]) it produce always the same time for all n. I think whnf don't totally evaluate it... Or alga is really impressive! A solution will be to make Graph an instance of NFData, so we can use nf. Others functions are sometimes benchmarked to a constant time (like hasVertex). I think the problem is linked.

snowleopard commented 6 years ago

@nobrakal Looks rather impressive! But might be a bit overengineered :)

I didn't really have much time to think this through to the last detail (I'm chasing ICFP deadline this week), but my feeling is that your abstractions can be simplified.

For GraphStruct: maybe we shouldn't bother with graph/vertex polymorphism and just stick to Graph Int as the initial graph description, allowing to remove the type parameter a. If we also allow only one size parameter per benchmark N, then we also don't need the type parameter b.

For FuncBench: Maybe it's better to just create a fixed number of constructors for the cases with no arguments, 1 argument, 2 arguments, etc.

Remember, your benchmarks need to be simple if you want the community to understand and believe them, even if this simplicity comes at the cost of code duplication.

Can you implement just mesh-n-by-n-diagonal benchmark as a starting point, without any generalisation?

nobrakal commented 6 years ago

Hi,

Yeah, it might be a little be complicated ^^.

You are right, the type parameter b is not very necessary in GraphStrcut. I think we can assume that we will only build a graph with a single Int argument.

Concerning the a (so the type of vertices of a Graph to bench) is, I think, mandatory: A mesh will produce Graph (a,a), a clique a Graph a and deBruijn a Graph [a]. We can of course convert to a list of edge (like your index functions) with only Int vertices. But, this will complicate many things:

Here, type polymorphism can handle this problem for us, help to really reduce some code, and I think make it clearer (I agree with you that a good documentation will be mandatory). During benchmarks I don't need to force the type of a Graph, I only need it to be coherent with the functions I want to benchmark on it.

I think all the problem is to do generalisation. mesh-n-by-n alone is not very difficult like you pointed it out. Anyway, I will write as soon I have some time to, as you say, have a starting point (maybe jumping directly to a generalised thing wasen't very wise)

snowleopard commented 6 years ago

I'll just leave this here: https://github.com/haskell-perf and https://github.com/haskell-perf/checklist

anfelor commented 6 years ago

@snowleopard I had exactly the same thought over the weekend. I modified the dictionaries repository to test some simple graph properties. So far my findings are as follows:

Good about this approach:

Bad about this approach:

nobrakal commented 6 years ago

Haskell-perf looks great ! (Its checklist is very interesting ^^).

@anfelor The idea is next to make an independent tool to benchmarks also other graphs libraries like described in this Haskell summer of Code idea. So maybe importing it in alga is not necessary...

snowleopard commented 6 years ago

@anfelor Awesome, thanks for giving haskell-perf a try!

I agree with @nobrakal: we probably don't need to make graph benchmarking code a part of Alga -- it may be more discoverable/useful if it is a part of a general Haskell performance benchmarking suite.

anfelor commented 6 years ago

I agree we don't need to import it into the main alga repository, but I believe that we (or at least I :)) need a (possibly separate) repository just for alga benchmarks. I have wondered repeatedly over the last couple of days how one implementation would compare against another, especially in connection with functional linear maps.

I would propose the following: Extend the graphs-benchmarks repository to include the other competing libraries to present an unbiased overview of the performance and clone it to create more specialised benchmarks testing more alga specific functions. For example, I believe that tests concerning clique shouldn't be in the former but in the latter.

Let's say however, that this effort progresses relatively far before the start of the official GSoC and the first task becomes trivial. What would happen to the proposal? I would hate to see my application rejected on the grounds that the problem has essentially been solved. Have you also applied @nobrakal? I don't want to gain an unfair advantage over you by preempting all design decisions. What do you think @snowleopard?

snowleopard commented 6 years ago

@anfelor I think we could have a general graph benchmark suite in haskell-perf and an Alga-specific benchmark suite in this repository, which would be tightly coupled with the development of Alga. This sounds like a good balance to me.

I think there are three proposals for this project: from @nobrakal, you and somebody else. From a brief look all three are very strong, so it would be a difficult choice :(

anfelor commented 6 years ago

@snowleopard But the alga-specific benchmark should not be based on the haskell-perf one? I would couple the alga-specific to the general (but not the other way around) to allow quick testing of standard use cases. And therefore keep the alga-specific in its own repository (at least until the license question is resolved, I have opened an issue for this https://github.com/haskell-perf/dictionaries/issues/10)

snowleopard commented 6 years ago

but not the other way around

Why not? It seems quite logical to take an Alga-specific benchmark suite, kept in this repository, and then extend it by adding more benchmarks of other graphs libraries.

I think having three repositories (Alga repo, Alga-specific benchmarks repo, and general benchmarks in haskell-perf) would be just too confusing. Let's stick to two: one here, for tightly coupling it with Alga development, and one in haskell-perf so that it can be easily discovered by the community.

nobrakal commented 6 years ago

@anfelor Your idea is great !

@snowleopard Maybe a git-submodule will fit here ? So your tests could easily extend the one in the haskell-perf repository.

Concerning GSoC, I don't know what to do ! As @anfelor says, if we make one of the principal objective before the program starts, I don't think the Haskell Committee will approve the project...

anfelor commented 6 years ago

It seems quite logical to take an Alga-specific benchmark suite, kept in this repository, and then extend it by adding more benchmarks of other graphs libraries.

Because that could make the benchmarks biased to include graphs with a lot of structure like cliques or trees (e.g. tasks for which alga is well suited). I would propose making the benchmarks that compare alga with other libraries completely independent of alga's main development. Some code sharing should be okay as long as they don't influence the range of benchmarks to much, though.

I would try to build it this way:

I think having three repositories (Alga repo, Alga-specific benchmarks repo, and general benchmarks in haskell-perf) would be just too confusing.

I agree, it could become that way. But I think one cannot embed a clone of another repository inside another? Or only as a git submodule as @nobrakal proposed?

anfelor commented 6 years ago

Or maybe alga-perf could be independent of haskell-perf/graphs and just use the same infrastructure?

anfelor commented 6 years ago

By the way, Chris Done just added an MIT license to haskell-perf/dictionaries, so we could merge anfelor/graph-benchmarks with this repository.

nobrakal commented 6 years ago

Or maybe alga-perf could be independent of haskell-perf/graphs and just use the same infrastructure?

@anfelor On my mind, if something is made on haskell-perf/graphs, allowing it to be simply extended by any of target libraries is a good idea. So one could simply clone/submodule the repo and compare traditional features with others, as well as adding "personnal-features" benchmark. It is easier to backport enhancements and bugfixes of the parent haskell-perf/graphs with this. This will also fits with any graph library, not only alga.

snowleopard commented 6 years ago

@anfelor @nobrakal You convinced me that haskell-perf/graphs should be the main benchmark repo :)

How will the Alga-specific benchmark suite look like then? Will it have haskell-perf/graphs as a submodule? Or maybe haskell-perf/graphs could somehow be made into a Hackage library, on which we could simply depend?

nobrakal commented 6 years ago

I didn't thought about library, but I think it is a good idea. The general graph benchmarking suite can expose a standard API for making test (on generic list of edges like described above) and standard tests to be made. It will also provide an executable that execute these standard tests on target libraries, through the library provided.

It will reduce many code repetition, and allow a great modularity!

Also, having separated code for the "engine" behind the benchmarking suite, the interface between the engine and libraries, and the benchmark code is I think a very good choice.

snowleopard commented 6 years ago

@nobrakal Sounds like a good plan! Let's see how it works out.

nobrakal commented 6 years ago

Great !

I am wanting to do some things but as @anfelor says, I don't want to do too many things before the GSoC: it is getting unfair advantages by starting to work on the project, and if too many is done, there is also a poor chance for the proposal to be accepted...

anfelor commented 6 years ago

@nobrakal Is that roughly what you had in mind?

class GraphImplementation g where
  someDenseGraph :: g -- according to some specification
  someSparseGraph :: g

testInsertEdge :: (Int -> g -> g) -> SingleTest
testRemoveEdge :: (Int -> g -> g) -> SingleTest
createReport :: [SingleTest] -> IO ()

I am not yet convinced that this is the right approach, but you should definitely build a prototype!

I don't think that prototypes will really be an issue as there is still a lot to be done afterwards (communication, reading up on other graph libraries, visualisation, etc..), but maybe, @snowleopard, you could write more about your experience with this? Did you have a similar situation in past GSoC projects?

The main reason why I didn't want to contribute to much to this issue, @nobrakal, is because I had already thought a bit about edge labelled graphs and felt this was your area. So if you want to work on this please do; it's only unfair to the third applicant but since he (or she?) has apparently not worked on the project at all so far I don't feel too guilty.

snowleopard commented 6 years ago

@nobrakal @anfelor I suggest you don't think about GSoC. Just work/collaborate as you would normally do on any open-source project. I don't think there is anything unfair about just being yourself :)

The discussions on GSoC have already started. Regardless of the outcome, I'd like to say that I am very grateful to you both for the contributions you have already made. I will get in touch with every candidate personally after the GSoC decision is made.

nobrakal commented 6 years ago

@snowleopard Ok then let's do some work then :smile: About the GSoC, it is our role to be grateful for your time !

nobrakal commented 6 years ago

@anfelor Here is what I thought (it is the opposite of your example: tests are not provided, but graphs are):

You can view the "very prototypal" version here (in a fork of alga, but it works alone).

You can read the idea in types:

-- Generic graph
type Edges = [(Int,Int)]

-- We want to pass the generic graph to create an according function to test
type ToFuncToBench a = Edges -> FuncToBench a

-- Type used to group both type of functions
data FuncToBench a = forall b. Consummer String (a -> b) | forall b c. FuncToBenchWithArgs String (a -> b -> c) (b -> String) [b]

-- An interface between ou generic graphs and others
class GraphImpl g where
  mkGraph :: Edges -> g

-- Will transform ToFuncToBench in ToFuncToBench for benchFunc'
benchFunc :: GraphImpl g => ToFuncToBench g -> [Edges] -> [Benchmark]

-- Principal function
benchFunc' :: GraphImpl g => FuncToBench g -> Edges -> Benchmark

So, the user will create a set of ToFuncToBench, and will be allowed to test it over standard graph (of type Edges).

For example, a path will look like

mkPath :: Int -> Edges
mkPath n = take n $ iterate ((\(x,y) -> (x+1,y+1)) :: (Int,Int) -> (Int,Int)) (0,1)

and a simple Consumer

isEmpty' :: ToFuncToBench (Graph Int)
isEmpty' = const $ Consummer "IsEmpty" isEmpty

At the end, you will only map over a set of ToFuncToBench.

It looks my previous attempt, but without the graph creation part, and it is I think more clear.

Any thoughts ? Maybe type aren't necessary, and the code can be more clear, but the idea is here.

nobrakal commented 6 years ago

Maybe I wrote something too generic, and this seems not to work for functions like hasEdge, in need of two arguments.

So please read:

-- Type used to group different types of functions
data FuncToBench a = forall b. Consummer String (a -> b) 
  | forall b c. FuncWithOneArg String (b -> a -> c) (b -> String) [b]
  | forall b c d. FuncWithTwoArgs String (b -> c -> a -> d) (b -> String) (c -> String) [(b,c)]

This allow something like:

edgesNotInPath :: Edges -> Edges
edgesNotInPath = map (\(x,y)-> (x-1,y+1))

pathHasEdge :: ToFuncToBench (Graph Int)
pathHasEdge = FuncWithTwoArgs "hasEdge" hasEdge show show . edgesNotInPath

Note that we can compose easily FuncWithTwoArgs "hasEdge" hasEdge show show with anything else

snowleopard commented 6 years ago

Came across this PR to Data.Graph: https://github.com/haskell/containers/pull/549. A good example of real-life graph benchmarking. Also see a discussion in the Libraries mailing list: https://mail.haskell.org/pipermail/libraries/2018-March/028689.html

nobrakal commented 6 years ago

Great to see a real-life benchmark!

On the PR, the benchmark is about graph creation, which is something very specific: the problem is much more with the NFData instance than with criterion and the need of doing many things almost identical without copying/pasting hundred of lines of code :smile: . Also, just tested, alga don't have any problem with fully evaluating a graph with circuit, meanwhile Data.Graph seems to have problem with their NFData instance...

The use of uncurry is perfect for functions that are needing arguments (so you can ignore my previous comment). I made a better example here: https://github.com/nobrakal/bench-graph

snowleopard commented 6 years ago

The use of uncurry is perfect for functions that are needing arguments

@nobrakal Indeed, this seems like a much simpler solution :)

meanwhile Data.Graph seems to have problem with their NFData instance...

By the way, I don't understand how/why Data.Graph has a broken rnf. It does not use anything like 'tying the knot` construction: it is simply an edgelist represented by an array. Could anyone explain this to me?

OlivierSohn commented 6 years ago

Hello, Note that in haskell/containers#549, the benchmark only works because the graph is acyclic (I first wrote an cyclic graph but there was an infinite loop, maybe linked to what @snowleopard says about rnf being broken in Data.Graph?)

snowleopard commented 6 years ago

Hey @OlivierSohn! Thanks, that's good to know. Could you test Data.Graph on cyclic graphs by writing a custom function for fully evaluating it instead of relying on rnf?

OlivierSohn commented 6 years ago

I won't have the time to do that but I'll add the suggestion in https://github.com/haskell/containers/issues/550. Thanks!

snowleopard commented 6 years ago

@nobrakal @anfelor By the way, today when I was waiting for what feels like an hour to build Alga because of a huge list of Criterion's dependencies I realised that it's not such a bad idea to have alga-benchmarks as a separate repository ;-) I might split the current benchmark suite off into a separate repo during this weekend.

nobrakal commented 6 years ago

Haha, yes criterion does not come alone :) (On my archlinux, every single update of an haskell-* package come with a rebuilt criterion ^^ )

We can also move discussion over this issue on this new repo, because it is much more benchmark-related than alga-related !

nobrakal commented 6 years ago

I followed my path here: build the library, generic graphs example, and try it over alga. I think it produce understandable results with an (almost) clean syntax.

I am facing a little problem though: Criterion benchmark a function by applying an argument to it, and fully evaluate the result. Here are the results over hasEdge:

benchmarking hasEdge/cycle1/(-1,2)
time                 1.104 μs   (1.100 μs .. 1.108 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.105 μs   (1.102 μs .. 1.107 μs)
std dev              8.781 ns   (7.687 ns .. 10.50 ns)

benchmarking hasEdge/cycle10/(-1,2)
time                 5.855 μs   (5.831 μs .. 5.879 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 5.869 μs   (5.842 μs .. 5.914 μs)
std dev              116.3 ns   (76.04 ns .. 206.5 ns)
variance introduced by outliers: 20% (moderately inflated)

benchmarking hasEdge/cycle100/(-1,2)
time                 44.34 μs   (44.28 μs .. 44.41 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 44.30 μs   (44.16 μs .. 44.40 μs)
std dev              400.7 ns   (311.8 ns .. 552.4 ns)

benchmarking hasEdge/cycle1000/(-1,2)
time                 469.9 μs   (468.5 μs .. 471.0 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 469.5 μs   (468.4 μs .. 470.6 μs)
std dev              3.682 μs   (3.115 μs .. 4.507 μs)

benchmarking hasEdge/cycle10000/(-1,2)
time                 6.625 ms   (6.579 ms .. 6.667 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 6.694 ms   (6.656 ms .. 6.761 ms)
std dev              141.9 μs   (76.95 μs .. 208.4 μs)

I don't know if it is very representative of hasEdge. It certainly have to be compared with the full-evaluation of the graph concerned alone.

OlivierSohn commented 6 years ago

Maybe whnf or whnfIO can solve the problem?

https://hackage.haskell.org/package/criterion-1.4.0.0/docs/Criterion-Main.html#v:whnf

nobrakal commented 6 years ago

I am quoting http://www.serpentine.com/criterion/tutorial.html

Guidelines for thinking about when to use nf or whnf:

  • If a result is a lazy structure (or a mix of strict and lazy, such as a balanced tree with lazy leaves), how much of it would a real-world caller use? You should be trying to evaluate as much of the result as a realistic consumer would. Blindly using nf could cause way too much unnecessary computation.
    • If a result is something simple like an Int, you’re probably safe using whnf—but then again, there should be no additional cost to using nf in these cases.

I agree that using nf of whnf is a choice to be made, but I don't think that change the problem (in fact, i have just tested, and the output is almost the same thing).