iobis / pyobis

OBIS Python client
https://iobis.github.io/pyobis
MIT License
14 stars 10 forks source link

[pre-commit.ci] pre-commit autoupdate #120

Closed pre-commit-ci[bot] closed 1 year ago

pre-commit-ci[bot] commented 1 year ago

updates:

7yl4r commented 1 year ago

These broken tests appear unrelated to the proposed changes. PyOBIS's unit tests make calls to the OBIS API and it appears that something broke in the actual data from OBIS. The following query is returning a 400 were it previously was not: https://api.obis.org/v3/occurrence/00003cf7-f2fc-4c53-98a6-7d846e70f5d1

I suggest we wait a week or two to see if the OBIS backend issue gets resolved and investigate further if not.

ocefpaf commented 1 year ago

@7yl4r that makes me think that we should break the tests into two classes.

  1. tests that are recorded with pytest-vcr so we can test the if the Python code works;
  2. test that hit the live data and check for breakages like this one.

1 would be mandatory to merge but 2 could be optional pending further investigation.

ayushanand18 commented 1 year ago

@7yl4r that makes me think that we should break the tests into two classes.

  1. tests that are recorded with pytest-vcr so we can test the if the Python code works;
  2. test that hit the live data and check for breakages like this one.

1 would be mandatory to merge but 2 could be optional pending further investigation.

Actually, at the moment almost all (if not all) tests ping the API to check for response consistency and that the transformations this package makes over data work fine. However, I believe one way of doing it could be using mock https responses in place of real data for the first one.

ocefpaf commented 1 year ago

However, I believe one way of doing it could be using mock https responses in place of real data for the first one

Mocking is probably the better alternative to what I suggested above but pytest-vcr is "easier." One records the network interaction and then play it back to the test. It is, in a way, a mock of the real interaction. My only issue with python mocking is that it is super easy to create meaningless tests.

7yl4r commented 1 year ago

we should break the tests into two classes.

The tests could be easily broken into two classes using pytest marks. I agree that we should, but as Ayush says: the vast majority of tests currently hit the API so it is a bit pointless right now.

one way of doing it could be using mock https responses in place of real data for the first one

I played with this idea and then gave up because I felt that there must be a tool out there to help do this. It looks like pytest-vcr is exactly that tool. Thank you @ocefpaf for sharing yet another awesome tool in the python ecosystem.

We should definitely pursue using pytest-vcr on some of these tests. Let's get two branches going:

  1. [ ] tag existing tests with a pytest-mark that indicates "live" API usage, don't use these tests in GHA CI by using something like pytest -k "not live_api"
  2. [ ] use pytest-mark to record & mock API requests
ayushanand18 commented 1 year ago

I read some docs on this discussion, and I made some observations:

  1. When we essentially record the network behavior and the returned responses we store them in cassettes as a YAML file.
  2. If the API changes then we need to flush these cassettes and re-record them. But if we do not know if the API has changed, it could lead to obsolete code pieces that no longer work while the tests keep on passing.
  3. The idea of recording tests is great as it saves time and leads to faster execution by many factors, but if we can figure out if the API has changed without leading to users facing the pain then it would be great.
  4. I suggest if we run all automatic workflows on recorded cassettes and one test that hits the API which triggers when the workflow is approved to run. This is to ease our efforts into checking if the API has really changed and simultaneously ease the process of executing unit tests.

Thoughts?

ocefpaf commented 1 year ago

Makes sense. We can keep a small subset of integration tests that hit a server too while keeping most of the unit ones recorded with vcr for speed.

ayushanand18 commented 1 year ago

We can keep a small subset of integration tests that hit a server too while keeping most of the unit ones recorded with vcr for speed.

So, we can partition out those requests which take 10s or longer over the network to use vcr and then run them in a workflow that pings the live API awaiting approval. Per this, most tests would qualify for pinging live API but some in ocurrences and checklist (especially those involving pagination).

ocefpaf commented 1 year ago

Even fast tests can add up. I'm not too familiar with the API to say that a smaller subset covers enough to catch API changes. But I would keep the ones that hit the API to a bare minimum. Or we can create another GHA that run as a weekly Cron job instead of every PR. What do you think?

ayushanand18 commented 1 year ago

Or we can create another GHA that run as a weekly Cron job instead of every PR. What do you think?

I would second to your thought. We can make every test on a PR to use vcr, and a weekly Cron job to hit the API.

ayushanand18 commented 1 year ago

What should we do with this PR? Should we rebase and re-run jobs or close this?

ocefpaf commented 1 year ago

What should we do with this PR? Should we rebase and re-run jobs or close this?

Let's close this. I'll move to the linting here to ruff in a future PR.