vincentarelbundock / pycountrycode

GNU General Public License v3.0
3 stars 4 forks source link

Begin property based testing #4

Open frank113 opened 1 year ago

frank113 commented 1 year ago

Summary

We can use the hypothesis package to make unit testing more robust. Property-based testing will allow us to generate examples and edge cases either through random strings or strategies that can be based on codelist.

For example the test_invalid_iso3c_to_country_name function in tests/test_basic.py could be rewritten as follows:

  1. Build a strategy to generate any three letter uppercase string that does not appear in codelist's iso3c column.
  2. Assert that the result of countrycode(code, "iso3c", "country.name") is always None for all inputs generated with the above strategy

Considerations

References

Other

As someone raised with Legos, specifically Lego Star Wars, I appreciate your profile picture on GitHub and Twitter.

vincentarelbundock commented 1 year ago

This looks very cool. I'm not super familiar with these concepts but will read up when I have time.

FYI, I'll be traveling for most of the next 3 weeks, so I won't be able to do much beyond clicking "merge" from my phone on non-controversial PRs. But I am interested in this and will definitely look into it when possible.

As someone raised with Legos, specifically Lego Star Wars, I appreciate your profile picture on GitHub and Twitter.

Hehe!

vincentarelbundock commented 1 year ago

@LamAdr, once you're done with the plot functions, maybe you could take a look at this?

The idea is to make it so that the countrycode() function returns the same type of output as the input it gets: plain list, pandas series, or polars series. Then, we make the pandas and polars dependencies optional, so that users don't have to install all that if they only want to convert lists of codes.

Here's what the AI assistant has to say about making dependencies optional (this makes sense to me, but maybe you can find something better?):

To deal with optional dependencies for optional features in a Python package managed by Poetry, you can declare a dependency group as optional in the pyproject.toml file. This makes sense when you have a group of dependencies that are only required in a particular environment or for a specific purpose. You can do this by adding a [tool.poetry.group.<group>] section where <group> is the name of your dependency group, and setting optional = true. Then, you can add dependencies to this group using the [tool.poetry.group.<group>.dependencies] section.

For example, to add the optional dependency redis to a package, you can run the command poetry add --optional redis, which will add the following configuration to your pyproject.toml file:

[tool.poetry.dependencies]
python = "^3.8"

[tool.poetry.dependencies.redis]
version = "^3.4.1"
optional = true

Then, to install the optional dependencies, you can run poetry install --extras "<group>", where <group> is the name of the optional dependency group. In the case of the redis example, you would run poetry install --extras "redis".

Here's an example of a function that uses an optional dependency if it is available:

def my_function():
    try:
        import optional_dependency
    except ImportError:
        print("Optional dependency not found, continuing without it.")
    else:
        # Use the optional dependency
        print("Optional dependency found, using it.")
LamAdr commented 1 year ago

Sure!

vincentarelbundock commented 1 year ago

@LamAdr if you have more time, I think it would be great to port some of the R tests to Python. I don't think we need to replicate the full test suite, because we already run it regularly in R. But testing some of the corner cases in particular would be worth it, just because the regex engines differ somewhat.

I would maybe start with these two files:

https://github.com/vincentarelbundock/countrycode/blob/main/tests/testthat/data-known-name-variations.R

https://github.com/vincentarelbundock/countrycode/blob/main/tests/testthat/test-corner-cases.R

LamAdr commented 1 year ago

Sure @vincentarelbundock. Would you prioritize that or the book? In any case, I hope you are not in a hurry because I'm getting examed for a couple weeks, and not sure how much I can get done.

vincentarelbundock commented 1 year ago

No rush at all. And no priority. You can choose what you work on.

frank113 commented 1 year ago

@vincentarelbundock Documenting this discovery as a comment as it falls under the umbrella of testing and impacts how we are doing property-based testing:

Pytest has a number of fixtures available for repeated use across tests. We are currently loading in codelist from countrycode for non-Polars tests and directly for tests involving Polars. As an aside, see comment in issue #8 for an update on enhancements that came out of getting the CI passing. My thoughts regarding these fixtures are to make stylistic changes to the tests to enable re-usability in terms of loaded data and helper functions. These can be scoped to either the entire test suite or to a group of tests by file. In my view the priority of using fixtures is low and can occur after more tests are ported from R.

vincentarelbundock commented 1 year ago

Yeah, that all sounds good to me. I don't have much experience with unit tests in Python, so I'm perfectly happy to defer to you on almost all of this. Thanks!