[Question] Dependent data for correct database insertion

bmabey commented 5 years ago

Thanks for this library! It was exactly what I needed and thought I'd have to write it till I saw this. :)

One question though, how have you handled tables with foreign keys? More generally, how have you approached generating records that are dependent on other records. The hypothesis docs seem to suggest using composite or using flatmap as in this Django example. Having to roll these one-off functions is unsatisfactory since the FK information has enough information to create such composites. How do you approach this in your apps?

I'm thinking about porting this clojure library to hypothesis. It has a general way of thinking about declaring dependent data with graphs and then generating example data from the graph in a succinct way. I'm thinking something like this paired with hypothesis-sqlalchemy could provide a slick way of generating complex DB data. Thoughts?

lycantropos commented 5 years ago

The short answer is: no, we have no support for constraints yet, and I think generating pseudo-random records for tables with foreign keys will require new function that will handle them simultaneously like

# not so sure about name/path
tabular.data.factory(table, other_table, ...) -> {table.name: [table_records...], other_table.name: [other_table_records...], ...}
# dictionary of lists looks like the "simplest" structure here

and inside it will create values for "independent" columns (without foreign keys) and after that create foreign keys values based on previously generated ones. If you have any ideas -- will be glad to discuss.

I know nothing about linked library, can you provide an example of usage (input-output) and how it can be paired? Should it be added as dependency?

bmabey commented 5 years ago

The linked library is in a different language but it has some good design aspects that we could take some ideas from. The big idea though is to only have to make a single library that deals with this sort of dependent data and that way libraries like hypothesis-sqlalchemy could add it as a dependency instead of having to rewrite it.

Mec-iS commented 4 years ago

Thanks for this library!

This is relevant because it is quite difficult at the moment to run more than 3/4 test rounds without running into some sort of constraint violation like psycopg2.errors.UniqueViolation (even for single models with a unique field like an email for example)

lycantropos commented 4 years ago

@Mec-iS: unique columns should be supported, that's odd, can you please provide a minimal example to reproduce?

Mec-iS commented 4 years ago

My current experience, I may have misunderstood the usage: I have been using .example() on the tabular data factory (I know that it is suggested to user given but there is not a way my code can generate a single example that way, even with max_examples=1 in the settings). So in the tests I managed to wipe the previous record in the model to avoid the conflict. I understand that in the case of unit tests the hypothesis approach of picking from past tests iterations make sense, but in the case of integration tests this could be a limitation as every ~4 record generated an already generated constraint is produced; I would like to generate multiple instances of the same model, do I need to use composite to generate multiple examples for the same test case?

lycantropos commented 4 years ago

you can always generate a list of records with unique/primary key constraints taken into consideration, but this obviously won't work by default for non-empty tables (and it probably shouldn't)

from my perspective test may consist of the following steps:

create required tables (setup),
add records/do some modifications, assertions, checking properties (test body)
clear tables (teardown) in my experience it worked pretty well without composite/.example

Mec-iS commented 4 years ago

I agree that the tests should be done on empty tables,

you can always generate a list of records with unique/primary key constraints taken into consideration,

then I missed the code example to do this. What is the suggested way to generate multiple records from the same factory to be used by a test case and then tear down?

lycantropos commented 4 years ago

@Mec-iS: please take a look at second option in the README:

>>> from hypothesis_sqlalchemy import tabular
>>> records_lists = tabular.records.lists_factory(user_table,
...                                               min_size=2,
...                                               max_size=5, 
...                                               email_address=strategies.emails())

lycantropos commented 4 years ago

dropping tables is outside of this library's context, but there is Table.drop method

lycantropos / hypothesis_sqlalchemy

[Question] Dependent data for correct database insertion #19