openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.31k stars 8.59k forks source link

[Proposal] Add test coverage #2734

Open kir0ul opened 2 years ago

kir0ul commented 2 years ago

Proposal

It would be nice to add somewhere a report of the test coverage.

Motivation

Currently there's no indication of the test coverage in Gym. This is a useful metric to get a sense of the code quality.

Pitch

Not sure what's the best way to implement this yet. As a bare minimum I would see a badge in the README showing the percentage of test coverage. We could also publish the HTML pages produced by Coverage.py/pytest-cov and publish them with GH Pages to be able to quickly see the details of which lines of code are covered by testing and which are not. There are also online services that do that for you but they're usually not open source and I'm not a big fan of those vendor lock-in solutions. Ideally I think I would like to have a report on the PR reporting the trend of the test coverage after adding a new commit, so that developers know if the code quality is improving or deteriorating.

Possible solutions:

  1. Code Coverage Report
  2. Coverage monitor
  3. coverage.py badge
  4. Pytest Coverage Commentator
  5. Publish code coverage report with GitLab Pages (this is for Gitlab but the principle is the same for Github)
  6. Pytest Coverage Comment
  7. Python Cov: Python Coverage Reporter GitHub Action
  8. genbadge

Checklist

kir0ul commented 2 years ago

To give an idea of point n°5 above, the HTML pages of the code coverage report would look something like the following, showing a breakdown of how much each Python file in the repo is tested:

image

Clicking on a file would give the detail of which lines of code are run during testing:

image

Related issue: https://github.com/openai/gym/issues/2729.

pseudo-rnd-thoughts commented 2 years ago

One of the goals of gym 1.1 (https://github.com/openai/gym/issues/2524) is to have full type hinting. I saw that mypy has the ability to generate a type coverage document in a similar way, however, I couldn't find any information for pyright. I'm just mentioning it as this proposal sparked the idea as adding typing coverage would help with the goal of full type hinting.

kir0ul commented 2 years ago

I saw that mypy has the ability to generate a type coverage document in a similar way, however, I couldn't find any information for pyright. I'm just mentioning it as this proposal sparked the idea as adding typing coverage would help with the goal of full type hinting.

Yes Mypy is nice for that, it can produce useful reports like below (not sure about Pyright either):

Name                    Lines  Precise  Imprecise  Any  Empty  Unanalyzed
-------------------------------------------------------------------------
calimag                     4        2          0    0      2           0
calimag.cli                39        7          0   17     15           0
calimag.config             86       15         11   23     37           0
calimag.constants           3        1          0    0      2           0
calimag.nwb_converter     316       16          7   48    245           0
calimag.parsers           572      232         20   41    278           1
calimag.suite2p_parser    229        5          0  126     98           0
kir0ul commented 2 years ago

I'm starting to lean towards Pytest Coverage Comment as a good candidate to use in our PRs (solution n°6 in the first post). If there are no strong opinions against it I'll start a PR.

pseudo-rnd-thoughts commented 2 years ago

Looking at the action, I noticed that it could report the number of warnings that pytest-cov has. Thinking about the original plan of https://github.com/openai/gym/pull/2707, could we have the output include the number of warnings produced so we can keep a track

kir0ul commented 2 years ago

Looking at the action, I noticed that it could report the number of warnings that pytest-cov has. Thinking about the original plan of #2707, could we have the output include the number of warnings produced so we can keep a track

Good catch, we should definitely add this if that's available!

kir0ul commented 2 years ago

I'm having some issues in getting the coverage working with the current setup, for some reason the coverage number doesn't get computed and stays at 0%. But it seems I can get it to work either by installing Gym in editable mode with pip install -e .[nomujoco], or by running Pytest without the --import-mode=append option. Both workarounds look related to https://github.com/openai/gym/pull/2429. Would anyone know if namespace packages are still an issue for setuptools in editable mode? Maybe @JesseFarebro?

JesseFarebro commented 2 years ago

@kir0ul namespace packages still don't work in editable mode but this won't impact the Gym tests. All the ALE-specific tests are in our repository, not Gym.

kir0ul commented 2 years ago

@JesseFarebro So does it mean we could get rid of the --import-mode=append option in Pytest or add back the editable install option with Pip?

JesseFarebro commented 2 years ago

Yes, I don't see the issue there. This has become more so an issue for me when I'm testing the ALE. If you don't want to perform any ALE-related tests (which you shouldn't) then you don't need to make any compromises regarding namespace packages. The ALE should be the only namespace package that Gym supports, and this is just for backwards compatibility as people relied on the module path of gym.envs.atari.

pseudo-rnd-thoughts commented 2 years ago

@JesseFarebro Could this be a change at 1.0 to remove this backward compatibility? Would this help both projects?

JesseFarebro commented 2 years ago

@pseudo-rnd-thoughts it could, I can keep backwards compatibility while deprecating the namespace package by conditionally registering the entry-point based on the Gym version.

kir0ul commented 2 years ago

Looking at the action, I noticed that it could report the number of warnings that pytest-cov has. Thinking about the original plan of #2707, could we have the output include the number of warnings produced so we can keep a track

@pseudo-rnd-thoughts Apparently displaying the number of warnings in the report summary is not possible as it's not part of the JUnit XML format: https://github.com/MishaKav/pytest-coverage-comment/issues/31#issuecomment-955597218.

kir0ul commented 2 years ago

Related discussion: https://github.com/Farama-Foundation/Jumpy/issues/7.

jkterry1 commented 1 year ago

Hey, we just launched gymnasium, a fork of Gym by the maintainers of Gym for the past 18 months where all maintenance and improvements will happen moving forward.

If you'd like to read more about the story behind the backstory behind this and our plans going forward, click here.