openml / openml-python

Python module to interface with OpenML
https://openml.github.io/openml-python/main/
Other
278 stars 143 forks source link

make tests run quicker #330

Open amueller opened 6 years ago

amueller commented 6 years ago

right now there's some really slow tests. We should have nose list the top 10 slowest tests using the nose timing module.

PGijsbers commented 5 years ago

(Leaving a note so I don't forget) - PR #644 introduces lazy loading for datasets, and tests should likely be updated to use lazy loading whenever possible.

mfeurer commented 3 years ago

Printing the 20 slowest tests was added in #548. What would be the next step?

PGijsbers commented 3 years ago

The best thing to do is probably systematically considering where we want to have integrations tests (i.e. tests which actually perform calls on the test server), or verify the query is correct instead of actually querying the server. We also have to look at which calls slow down tests and whether or not we should solve this in openml-python or on the server. Hopefully we can move away from using the server as a default, and only use it just enough to ensure our communication layer is still sound. We could mark those specific tests, so it easy to avoid running them during development (and then just run them before a PR).

The other part just involves looking at individual tests (current 20 slowest tests), see if we can speed them up either by removing redundant server calls, or by refactoring some of the code to make it more easily testable.

For instance, looking at the slowest test we have test_get_flow_id clocking in 167sec (for reference included at the bottom of this post) to test get_flow_id.

I see a couple things here that I would consider changing or discussing. We see that the unit test calls openml.flows.get_flow_id(model=clf, exact_version=True) as well as openml.flows.get_flow_id(name=flow.name, exact_version=True), same for exact_version=False. The resulting call to flow_exists or list_flows is then the same (edit: using name overrides the exact_version to False), and those calls are well over 95% of the total time taken for the test.

I would consider:

PGijsbers commented 3 years ago

For the example above, I would consider something like this instead:

    def test_get_flow_id(self):
        # === new stuff ===
        def mock_list_flows(**kwargs):
            print("Mock!")
            return pd.DataFrame(
                data=[
                    [1, "weka.ZeroR(1)", "weka.ZeroR", 1, "WEKA==2.0", 16],
                    [2, "weka.ZeroR(2)", "weka.ZeroR", 1, "WEKA==3.0", 16],
                    [41010, "sklearn.tree._classes.DecisionTreeClassifier(1)", "sklearn.tree._classes.DecisionTreeClassifier", 1,"openml==0.11.0dev,sklearn==0.22.dev0", 1159],
                    [358739, "sklearn.tree._classes.DecisionTreeClassifier(2)", "sklearn.tree._classes.DecisionTreeClassifier", 1,"openml==0.11.0dev,sklearn==0.23.2", 1160],
                    [358738, "sklearn.tree._classes.DecisionTreeClassifier(3)", "sklearn.tree._classes.DecisionTreeClassifier", 1,"openml==0.11.0dev,sklearn==0.21.2", 1160],
                ],
                columns=["id", "full_name", "name", "version", "external_version", "uploader"]
            )

        openml.utils._list_all = mock_list_flows

        # === end new stuff ===

        clf = sklearn.tree.DecisionTreeClassifier()
        flow = openml.extensions.get_extension_by_model(clf).model_to_flow(clf).publish()
        self.assertEqual(openml.flows.get_flow_id(model=clf, exact_version=True), flow.flow_id)
        flow_ids = openml.flows.get_flow_id(model=clf, exact_version=False)
        self.assertIn(flow.flow_id, flow_ids)
        self.assertGreater(len(flow_ids), 2)

        # Check that the output of get_flow_id is identical if only the name is given, no matter
        # whether exact_version is set to True or False.
        flow_ids_exact_version_True = openml.flows.get_flow_id(name=flow.name, exact_version=True)
        flow_ids_exact_version_False = openml.flows.get_flow_id(
            name=flow.name, exact_version=False,
        )
        self.assertEqual(flow_ids_exact_version_True, flow_ids_exact_version_False)
        self.assertIn(flow.flow_id, flow_ids_exact_version_True)
        self.assertGreater(len(flow_ids_exact_version_True), 2)

Notes:

mfeurer commented 3 years ago

Thanks for your suggestions. Let me briefly ask about the motivation. Is it:

I agree that we can improve the unit testing by making them more intelligent, but I'm not sure how much we should focus on this. I agree that such extreme cases as test_get_flow_id need to be fixed, but probably not a lot more as they are nicely executed in parallel.

I couldn't get the pytest monkeypatch fixture to work in the class method

I don't think fixtures work on methods at all, only on functions.

You would complement this with a test which verifies the list_flows call actually returns the dataframe correctly. This would be executed once, and each other test relying on it would just mock the response instead.

Would it be possible that we cache the API GET requests as a first step? This would be pretty similar to what you're proposing, and could be disabled to get the original behavior. As a plus, we could achieve this with regular unittest mocking.

PGijsbers commented 3 years ago
  • online tests are too slow despite running in parallel?
  • local tests are too slow despite running in parallel?

Yes to both. No matter if it is local or online, the total work load is quite big, even when running in parallel. Ideally you want to be able to run (at least the relevant part of) the test suite after each change you make, that won't happen if you have individual tests that take minute(s) to complete. Let's also not forget that while we can easily run 8 or even 16 parallel tests, this is not an option for everyone.

Just making CI faster is also a great boon for productivity (in particular during hackathons).

  • too much load on the server

Lower priority, because it is (mostly) just the test server. In the most optimistic case it lowers the chance of server failures that lead to flaky tests, but I would assume this is not the case until proven otherwise.

I agree that we can improve the unit testing by making them more intelligent, but I'm not sure how much we should focus on this. I agree that such extreme cases as test_get_flow_id need to be fixed, but probably not a lot more as they are nicely executed in parallel.

It's always a balancing act. Better coverage, faster unit tests, and refactoring are all tools to make sure the project stays maintainable. I agree there is no sense in optimizing tests which take a few seconds or less, at least in the near future. There are more important things to do.

I don't think fixtures work on methods at all, only on functions.

Seems you are right (except for auto-use fixtures).

Would it be possible that we cache the API GET requests as a first step?

You can easily cache at any level of the API, e.g. start the test with:

        import functools
        openml.utils._list_all = functools.lru_cache()(openml.utils._list_all)

reduces the time from roughly 120s down to 40s on my setup (as the a list_flows call takes 40 seconds). Above is functional, but I am not sure what the nicest/proper way should be. You also can't blindly use the same cache across all functions (e.g. I can imagine one method might check if a flow exists, then when it doesn't, publish the flow, and finally make sure it exists in the updated server list).

... we could achieve this with regular unittest mocking.

I think it remains confusing that we aim to use two different test frameworks.

mfeurer commented 3 years ago

Just making CI faster is also a great boon for productivity (in particular during hackathons).

Yeah, that would be the biggest improvement for me.

Above is functional, but I am not sure what the nicest/proper way should be.

One could use a fixture that adds a cache to the GET calls (if helpful) and we otherwise keep the tests as they are? And also go through the most expensive tests and see whether we can speed them up a bit.

You also can't blindly use the same cache across all functions (e.g. I can imagine one method might check if a flow exists, then when it doesn't, publish the flow, and finally make sure it exists in the updated server list).

Yeah, you're right. I was hoping this would do the trick, but that won't help here...

I think it remains confusing that we aim to use two different test frameworks.

I guess the reason I suggested this is that I'm familiar with unittest.mock, but not with pytest (yet).

PGijsbers commented 3 years ago

One could use a fixture that adds a cache to the GET calls (if helpful) and we otherwise keep the tests as they are?

Like you pointed out, fixtures won't work unless we rework the code to use pytest proper instead of mostly unittest style. Just adding it to setup is probably generally going to be too broad.