move-coop / parsons

A python library of connectors for the progressive community.
https://www.parsonsproject.org/
Other
261 stars 132 forks source link

Change approach to fake test data #835

Open shaunagm opened 1 year ago

shaunagm commented 1 year ago

Currently, our connector tests involve large amounts of fake data, usually in JSON format (but occasionally stored as Python dicts, csvs, or other formats). Sometimes this data is incorporated into the tests themselves, making them hard to read. Sometimes they're put in separate files, which is better, but it's still not ideal to have, say, a 400 line test data file to test just one connector.

Are there other approaches that might be more readable, easier to maintain, and easier to write? (I know generating the test data is often the most annoying part of writing tests for connectors.)

I'm aware of tools like Factory Boy but that's for Python objects, not really for data. There's Faker which seems more promising.

Another option might be making use of Json Schemas although "validate the schema" isn't a huge part of the tests we're doing.

(I don't love that any of these approaches would involve adding another dependency - maybe it's time to separate out the handful of dev dependencies, like we do the docs dependencies?)

Whatever we do, we should make sure to document it really well so that it makes the lives of people writing Parsons tests easier rather than harder and more confusing.

What do folks think?

corasaurus-hex commented 1 year ago

What do you think about something like hypothesis-jsonschema? The plus side to using something like this is that you can default to running just one example per test but can, in CI or otherwise, use more examples to stress test.

shaunagm commented 1 year ago

@corasaurus-hex great suggestion. I haven't used hypothesis-jsonschema before, so my main concern is around usability for people who don't have engineering backgrounds. Also just general time to implement vs other solutions. But this definitely deserves consideration!

corasaurus-hex commented 1 year ago

@shaunagm that's an extremely fair take on that, it's definitely more challenging and time-consuming to implement, and maybe a little confusing if they find it fails in once instance and not in another because the data is all generated. so, consider that suggestion retracted.

As a side note, it looks like Factory Boy can create dicts, which I wasn't aware of.