Problem: Lack of tests against an in-memory DynamoDB lowers confidence in module

tristanls commented 4 years ago

The reason why I was using a real database is the higher confidence when writing DynamoDB statements. I already experienced bugs related to bad statements even with passing tests because mocking/stubbing/spying was used instead of the memory DB.

@simlu:

Would be best combined with an in memory dynamodb instance for "integration" style tests.

jepetko commented 4 years ago

When writing tests involving a complex external dependency, the external dependency itself should be used (if technically feasible) in order to be as close as possible to the real environment. With this, we can test the real state of the DB. The DB will complain if there is an issue with the statement (bad syntax, null values etc) or the usage of the API. Issues will be visible immediately because it breaks the test. The drawback of this approach is that the tests are slower.

The second approach is to use spies to test for communication. This way we can just answer the question: "does the DB statement have this structure" or "this options have been passed to the DB". However, we dont know whether the passed statements/values are valid in terms of the DB. This can lead to ilusive confidence: the test are green (forever, even if the DB is switched to new version etc.) but the software actually does not work. These tests are fast but in my opinion not that valuable.

simlu commented 4 years ago

@jepetko Hello and thanks for your response!

At some point in my career I would have probably agreed with you. I'm hoping to have a good discussion with you here why I'm no longer.

The design pattern I was alluding to is very common when working with external API's. And we are treating dynamodb as such. The goal of such tests is to make it very easy to rerecord them (ideally just delete the cassettes).

The nature of external API's is such that it is very hard to simulate edge cases: Eg in the case of dynamodb how do you test for throttling and how this module behaves in that case. It's going to be possible to get dynamodb to throttle and record the request and response. However to get this happen consistently in a test is going to be hard.

We do not need to continuously test dynamodb itself. Instead we rerecord our tests if (1) call signature from this module changes (2) the external api changes its signature

Our tests being green and the library not working is a thing that can temporarily happen (eg when Amazon is down). That doesn't mean that the tests should fail.

I've worked with these kind of tests since several years now and am practically breathing them. They work really well and give you the best of both worlds. Bonus is that you don't need aws creds for running tests in ci.

Oh and if you needed proof that they work, they just caught this regression: https://github.com/tristanls/dynamodb-lock-client/issues/29

I'm happy with having some tests run against an in memory version of dynamodb and recording the above mentioned mock tests against a real dynamodb version. However I'm not a fan of running tests suits continuously against aws dynamodb. They will randomly fail if you do (again from personal experience, aws returns 500 quite often)

I'm thinking of dynamodb not as a local "rds" instance that you can just spin up / down. But rather as a SaaS api that needs to be tested as such. If it breaks it needs to be communicated by aws. It is not our responsibility to do continuous regression testing against it.

Looking forward to hearing your side / opinion!

tristanls commented 4 years ago

(looks like @simlu commented while I was drafting this)

In this module's case, the external dependency is a distributed system behind an API. The in-memory DB approximates this external dependency. This approximation is itself a system, separate from the external dependency, which can contain bugs and be out of sync. So, while I feel I understand the lineage of standing up a database locally (because it was the same database running in production), I am less optimistic that the method is as valid in current distributed system environment. For example, one difference that may be relevant for this module, from the differences section of downloadable DynamoDB (amongst others):

Read operations are eventually consistent. However, due to the speed of DynamoDB running on your computer, most reads appear to be strongly consistent.

jepetko commented 4 years ago

Hi guys,

yes, I agree basically with all what was said. I really apologize for not considering the 3rd option which would be basically usage of the real AWS API (unfortunately, this is not always possible in the corporate environment for security reasons). Yes, I would also record the requests/responses (for example using yakbak) and use this as sustainable solution in CI/CD (no failed tests on network outage) because in case of DynamoDB HTTP requests are made towards AWS.

But let's go back to using the dynamodb-local dependency for testing. Recently, I implemented some delete logic and ended up with the error Too many items requested for the BatchWriteItem call. With a simply stubbed DynamoDB I would have completely missed that.

It would be great to have the best of both worlds (as you said, @simlu ): using the real DB (preferably AWS DynamoDB and dynamodb-local as fallback, respectively) to record the behavior and then run those tests on every and each environment without any restrictions. As the last option, I would use stubbing/mocking.

JakeGinnivan commented 4 years ago

I have used https://www.npmjs.com/package/dynalite for local testing. It's goal is to replicate dynamodb's real errors for this exact purpose.

tristanls / dynamodb-lock-client

Problem: Lack of tests against an in-memory DynamoDB lowers confidence in module #27