Explain why service virtualization is a good idea in Readme

gc-ss commented 3 years ago

I would like to work on this with you.

Some input:

service virtualization helps in multiple ways:

helps us save the raw response that came from an external input/data source and decouples our tests from that external input/data source

We can then parameterize the test to first use hoverfly and then not to use it. This way the same test runs against a cached and live data.

if the external input/data source slightly changes, our tests, when it does not use hoverfly now fails while the test that does use hoverfly continues to pass - making it clear our code has had no regression or bug introduced but the external input/data source has changed.

Now we can diff the cached response in hoverfly vs. the one that comes from the external input/data source the figure out what changed.

If we did not have this, we could go down a wild goose chase to figure out what happened (did the test fail because our code has had a regression or bug introduced or did the external input/data source change?)

WouldYouKindly commented 3 years ago

@gc-ss hello! Thanks for the input. There's certainly value in this, but the same can be just as easily achieved with plain old mocks: run your tests against mocks, and then against real service. If the second run fails, you know it's due to the data source change.

In my experience, the value comes mostly from these things:

You don't spend time writing mocks
The data is 100% real, there's no chance that you messed up your mocks
It is a real network interaction. If you change your HTTP library, your tests will still work - or they will fail due to the differences in HTTP library, giving you valuable information. This also gives you the ability to test that you correctly work with your HTTP library (handle results and exceptions correctly)
You can play with the recorded data without calling the real service. E.g. you need to use a json field that you haven't used before - you can write the code and test it against the recorded data. If you wrote your mocks by hand, you would have probably included only the fields that you need, and then you have to update the mock

The biggest issues for me are:

People have to be taught
It's hard to rerecord simulations. E.g. you testing your k8s library. Then you need to create k8s objects by hand, record the simulations, then delete the objects. Sometimes setup can be quite complex compared to mocks
There is no way to check that all matchers in a simulation have been used

gc-ss commented 3 years ago

@gc-ss hello! Thanks for the input. There's certainly value in this, but the same can be just as easily achieved with plain old mocks: run your tests against mocks, and then against real service. If the second run fails, you know it's due to the data source change.

In my experience, the value comes mostly from these things:

You don't spend time writing mocks

The data is 100% real, there's no chance that you messed up your mocks

It is a real network interaction. If you change your HTTP library, your tests will still work - or they will fail due to the differences in HTTP library, giving you valuable information. This also gives you the ability to test that you correctly work with your HTTP library (handle results and exceptions correctly)

Exactly, the above reasons are very valuable and helps make the case WHY it's worth investing/spending time/effort on service virtualization.

You can play with the recorded data without calling the real service. E.g. you need to use a json field that you haven't used before - you can write the code and test it against the recorded data.

… and this not only saves time, effort but also exposes our code to real world data.

If you wrote your mocks by hand, you would have probably included only the fields that you need, and then you have to update the mock

Exactly. Another risk with the developer who wrote the code also writing the mock is that they often are biased with their thoughts and write the mock according to their own understanding.

Now as we know - bugs happen when understanding differs from reality.

Using service virtualization based off real world data, the understanding of the developer who wrote the code gets tested - not against some version of mock data that the developer thought should be, but what is.

The biggest issues for me are:

People have to be taught

It's hard to rerecord simulations. E.g. you testing your k8s library. Then you need to create k8s objects by hand, record the simulations, then delete the objects. Sometimes setup can be quite complex compared to mocks

There is no way to check that all matchers in a simulation have been used

I understand. Here are my thoughts on how you and I can help address these concerns:

People have to be taught

What if we make demos, python test samples and docs?

There is no way to check that all matchers in a simulation have been used

This could be hard but I have a few ideas with property based testing.

It's hard to rerecord simulations. E.g. you testing your k8s library. Then you need to create k8s objects by hand, record the simulations, then delete the objects. Sometimes setup can be quite complex compared to mocks

This is very true. This is the biggest challenge - and the answer is how much the business really values correctness and the money they are willing to spend to ensure correctness. The best you and I can do is make recommendations, so we make demos, python test samples and docs to highlight what service virtualization brings to the table.

WouldYouKindly commented 3 years ago

w.r.t. rerecording, I think the best approach is to use pytest fixtures to create/destroy resources. But currently the network requests from fixtures will be intercepted by Hoverfly. And there should be a way to tell the resource fixtures that they are in a simulation and don't need to do anything. But this is a separate issue.. Also, maybe @hoverfly decorator should accept some optional setup_instructions parameter, to encourage people to write down the things that have to be done to re-record. I found it to be a huge problem, when people other than the original author have to rerecord tests.

What if we make demos, python test samples and docs?

I found that the most annoying part is debugging the tests. Currently, if an http request fails to match, the plugin spits out Hoverfly's error logs. These are hard to parse for a person that does not have experience with Hoverfly. This can be fixed by parsing the error logs and outputting the more informative diff, but I haven't got time to do this yet.

Anyway, if you'd like to work on this, PRs are welcome :)

gc-ss commented 3 years ago

w.r.t. rerecording, I think the best approach is to use pytest fixtures to create/destroy resources.

Agreed.

But currently the network requests from fixtures will be intercepted by Hoverfly. And there should be a way to tell the resource fixtures that they are in a simulation and don't need to do anything. But this is a separate issue..

This makes sense only if you're monkeypatching the proxies for the underlying libraries without explicitly asking the user.

I need to read your code but if you are doing that - don't.

It might feel like making it easy for the user but then, the user will have to fight/undo the monkeypatching, if they have well designed software where they are passing proxies to their underlying library correctly, as one should - eg, for requests: (https://docs.python-requests.org/en/latest/api/#requests.request)

Let the person writing the test do that. We can show them how to monkeypatch using fixtures themselves if needed if they don't care. That way they can turn it on/off explicitly.

Consider that the test writer needs to fire off a few requests in the test without making it go through the monkeypatch - how would we allow them to do that?

For example, if requests is being used, we can easily pass the proxy host and port information as I am pointing out to the tavern testing framework author here: https://github.com/taverntesting/tavern/issues/673#issuecomment-822082204

What do you think?

gc-ss commented 3 years ago

I found it to be a huge problem, when people other than the original author have to rerecord tests.

Can you explain a bit why it's a huge problem?

From my POV, rerecording tests should be as simple as deleting the existing saved simulation, turning recording on, saving the new simulation, turning recording off.

This is because the test environment is already setup with the exact, correct environment including credentials/tokens/resource paths etc to make the request.

What am I missing?

I found that the most annoying part is debugging the tests. Currently, if an http request fails to match, the plugin spits out Hoverfly's error logs. These are hard to parse for a person that does not have experience with Hoverfly. This can be fixed by parsing the error logs and outputting the more informative diff, but I haven't got time to do this yet.

I would need to have a look at some of the real world tests you're writing to understand the context.

They way I use Hoverfly is to act as a cache and then turn off networking in the container that runs the test.

As a result, those requests that are cached by Hoverfly return the cached response but if my tests make an unexpected query that is, as a result, not already cached by Hoverfly, it finds not match and sends the request through and the request fails as it cannot find the actual server.

If you show me your test I can explain better. Otherwise I can write a sample test with the dockerfiles if you would like to check them out but this will take a week or so to get to you.

Anyway, if you'd like to work on this, PRs are welcome :)

Sure! I do need to understand from you what your criteria/requirements to merge in PRs are but we can talk over email for that.

WouldYouKindly commented 3 years ago

@gc-ss You are right, there should be a way to turn off patching of the env vars. I think we still should patch by default though, as it lowers the entry barrier.

Can you explain a bit why it's a huge problem?

Exactly because there may be no info on how to prepare the required resources. It is not always possible or optimal to automate such creation. So in a lot of cases, there is a need for a textual instruction, thus my idea about setup_instructions param.

I would need to have a look at some of the real world tests you're writing to understand the context. Yes, the request fails if no match is found. But it is not immediately obvious why it fails.

This plugin detects that the test has failed, and prints the error logs from Hoverfly if there are any. But these logs can be made more informative.

wrike / pytest-hoverfly

Explain why service virtualization is a good idea in Readme #3