sourcecred / research

Repository for research-related items on SourceCred's agenda
9 stars 7 forks source link

Unit Testing for graph_generators.py #17

Closed mzargham closed 5 years ago

mzargham commented 5 years ago

The graph_generators.py file contains useful tools for generating simple reference graphs to be used as part of exploration and later requirements testing for the sourceCred pagerank algorithm.

[An example of a requirement test would be if we imposed fairness requirements based on how the algorithm distributed cred along a directed line graph]

https://github.com/sourcecred/research/blob/mz-exploratory/exploratory/mZargham/pre-infra/graph_generators.py

@mzargham requesting help from @BrianLitwin to set up proper unit tests prior to submitting pull request to master as 'infra'

BrianLitwin commented 5 years ago

I wanted to throw some code up to give y'all a chance to stop me if I'm going in the wrong direction.

I'm trying to avoid testing things that are already tested in the networkx library, e.g. the node outputs of the networkx graph generators.

My intuition is that we should be doing a whole lot of parameter validation testing, but not sure how to write that in clean python atm.


class GraphGenerators_LineGraph(unittest.TestCase):
    def test_lineGraphGen_sanity(self):
        g = lineGraphGen(4)
        self.assertEqual(list(g.edges()), [(0, 1), (1, 2), (2, 3)])

    def test_lineGraphGen_bidir(self):
        g = lineGraphGen(4, bidir=True)
        # (node, neighbor)
        self.assertEqual(list(g.edges()), [(0, 1), (1, 0), (1, 2), (2, 1), (2, 3), (3, 2)])

class GraphGenerators_StarGraph(unittest.TestCase):

# how to validate that 'kind' is either 'source', 'sink', or 'bidir' ?? 

    def test_starGraph_sink(self):
        g = starGraphGen(4)
        self.assertEqual(list(g.edges()), [(1, 0), (2, 0), (3, 0), (4, 0)])

    def test_starGraph_source(self):
        g = starGraphGen(4, kind='source')
        self.assertEqual(list(g.edges()), [(0, 1), (0, 2), (0, 3), (0, 4)])

    def test_starGraph_bidir(self):
        g = starGraphGen(4, kind='bidir')
        self.assertEqual(list(g.edges()), [(0, 1), (0, 2), (0, 3), (0, 4), (1, 0), (2, 0), (3, 0), (4, 0)])

class GraphGenerators_TreeGraph(unittest.TestCase):
    #note: our default is diff from networkx's default in this case (sink vs source)
    def test_treeGraph_source_sink(self):
        g = treeGraphGen(2,2)
        self.assertEqual(list(g.edges()), [(1, 0), (2, 0), (3, 1), (4, 1), (5, 2), (6, 2)])

    def test_treeGraph_source(self):
        g = treeGraphGen(2,2, kind='source')
        self.assertEqual(list(g.edges()), [(0, 1), (0, 2), (1, 3), (1, 4), (2, 5), (2, 6)])

    def test_treeGraph_bidir(self):
        g = treeGraphGen(2,2, kind='bidir')
        self.assertEqual(list(g.edges()), [
            (1, 0), (1, 3), (1, 4), (0, 1), (0, 2), (2, 0),
            (2, 5), (2, 6), (3, 1), (4, 1), (5, 2), (6, 2)
        ])

# this is the way I might test that the types assigned in Z's functions are correct 
# I think it would be a good idea to abstract out that logic into a helper function 
# so that the code isn't duplicated in every generator 

class GraphGenerators_SetType(self):
    def test_lineGraphGen_setType(self):
        g = lineGraphGen(4, nodeTypeName='foo',edgeTypeName='bar')

        # each node and edge should have the correct 'type' value
        for k, v in nx.get_node_attributes(g, 'type').items():
            self.assertEqual(v, 'foo')

        for k, v in nx.get_edge_attributes(g, 'type').items():
            self.assertEqual(v, 'bar')


Since it came up in our recent Office Hours, thought I'd articulate my general feelings about testing. I try to 1) enforce consistency so that if important code changes we will get a flag (a failing test) 2) test for accuracy of inputs/outputs 3) anticipate and prevent bugs. In a lot of cases this means covering all parameter permutations 4) use tests to think about where to refactor code/ makes inputs and outputs cleaner/ use purer functions

mzargham commented 5 years ago

@BrianLitwin, this makes sense to me. These unit tests are effectively automated versions of the kinds of the spot tests I was running with my notebooks? If this is just so that if we edit them later we can quickly check that they weren't broken by checking consistency on some small graphs of each type, then I imagine this is sufficient. Did they pass the tests?

teamdandelion commented 5 years ago

Let's please move discussions like this into a pull request. That way, it's possible to add comments, run tests, etc. Doing code review in an issue is just the wrong use of GitHub's UI.

@mzargham I recognize that this adds a bit more overhead for you in terms of learning Git. But it will be worth it! Here's an approach we could take for getting the file above into its own pull request:

# fetch latest copy of code upstream
$ git fetch

# go to current state of the code upstream. often time people will do "git
# pull" followed by "git checkout master", but this isn't as good because it is
# possible that your local branch of master will become de-synchronized from
# upstream (e.g. because you accidentally added a commit) and it gets
# confusing. best to just always start from origin's copy of master.
$ git checkout origin/master

# grab the file graph_generators.py from the mz-exploratory branch
$ git checkout -p mz-exploratory exploratory/mZargham/pre-infra/graph_generators.py

# once this exploratory code is "ready", it will get merged into infra. So
# prepare ourselves for success by moving it into infra pre-emptively. Then
# when the pull merges, it'll be in the right place.

$ git mv exploratory/mZargham/pre-infra/graph_generators.py infra/graph_generators.py
# commit it
$ git commit -m "graph_generators for initial review"

# you are now in a "detached HEAD" state. Basically, the commit that was just
# made isn't pointed to by any branch. that's no problem though, time to make a
# new branch.
$ git checkout -b graph-generators-for-testing

# now we need to make an upstream branch corresponding to this branch
$ git push --set-upstream origin graph-generators-for-testing

# now it's possible to create a pull request, from the GitHub UI, containing
# this code!

Once the branch is done, @BrianLitwin can add unit tests directly to the branch by adding another commit:

# on fetch, Brian's machine will learn about the new graph-generators-for-testing branch
$ git fetch
# on checkout, since there is no local branch called graph-generators-for-testing,
# git will create a new local branch with that name, tracking the upstream version that
# Z created
$ git checkout graph-generators-for-testing

# at this point Brian can write the tests and commit them directly to the same branch
$ git add infra/graph_generators_test.py
$ git commit -m "add tests"

# since this branch is already tracking an upstream branch, 
# a simple "git push" will suffice.
$ git push

The upshot of doing it this way is:

cc @wchargin in case you want to comment on the git workflows

mzargham commented 5 years ago

For posterity, here is the git workflow that ended up working:

$ git fetch
$ git checkout origin/mz-exploratory

# this should not be required again but it was the first time
$ mkdir -p infra

$ cp exploratory/mZargham/pre-infra/graph_generators.py infra/
$ git checkout origin/master
$ git add infra/graph_generators.py
$ git commit

$ git checkout -b graph-generators-for-testing

# now we need to make an upstream branch corresponding to this branch
$ git push --set-upstream origin graph-generators-for-testing

Thanks to @decentralion for helping troubleshoot; I am going to try to repeat this a

teamdandelion commented 5 years ago

@mzargham don't forget to make the pull request. :)

mzargham commented 5 years ago

Working with @BrianLitwin we moved this code into the repo.