Closed jsmassa closed 1 year ago
The strategy makes sense and normalizing the tests like this must be the key strategy to avoid duplication and arbitrary symmetry breaking (i.e. having different tests for different configurations). Thanks for addressing this @jsmassa !
I am not familiar enough with the attribute-ref mismatches. For schema settings I would suggest to always assume the stricter requirements (schema-on-write
) and then potentially skip it automatically for schema-on-read
. This can be done with a function in the test namespaces that dispatches on the schema configuration. Do you think this would work? Would this also be possible for the attribute-refs?
Also, is there still a reason to have attribute-refs only optionally? Edit: Yes, of course, because they depend on schema-on-write. But does this mean that when we always provide a schema all tests can be run with attribute-refs?
A related serious problem I just stumbled over in #332 is that many tests use the same database, which worked so far for the memory database because the connections were not (properly) shared. It would be good if the db :id
for the memory database could be automatically retrieved from the test name. Can we address this here as well?
Edit: I have solved this there for now by manually duplicating the connections for the tests.
The schema isn't really a problem. Yes, we can always add a schema, but we could also only add it when required. That doesn't make much of a difference, I think.
The most problematic thing is still the entity ID which is being checked in a lot of the tests, so there is heaps of rewriting to do. Entity IDs for user entities are shifted using attribute references because system attributes like :db/ident
or :db/valueType
also need an entity ID and those are added to the database before the first user entities can be transacted.
The schema isn't really a problem. Yes, we can always add a schema, but we could also only add it when required. That doesn't make much of a difference, I think.
The most problematic thing is still the entity ID which is being checked in a lot of the tests, so there is heaps of rewriting to do. Entity IDs for user entities are shifted using attribute references because system attributes like :db/ident
or :db/valueType
also need an entity ID and those are added to the database before the first user entities can be transacted.
Could we just grab max-eid at the beginning of the tests and use this as the offset?
Could we just grab max-eid at the beginning of the tests and use this as the offset?
This is approximately how I did it for the attribute reference tests. Some of the tests used specific input datoms though in addition to the expected output datoms, so the input had to be manipulated as well. This lead me to write some helper functions to shift collections of datoms according to the amount of the system entity datoms. So far, there are also separate functions to shift the value as well which is needed for attributes with reference value type. Of course, this can be made simpler with a more complex helper function detecting when the value is a reference and has to be shifted.
In some instances there might not be a way around using functions to shift the entity IDs to the correct values, but I would say that in the end, a database user will not care about which ID an entity gets allocated inside the database or if the attributes are saved as references or keywords. This is why I think that, ideally, we should avoid checking those details for high-level functions like transact
, q
or pull
.
I would say that the strategies I used for the attribute reference tests in test/datahike/test/attribute_refs/
are ok and - with some improvements - can be used similarly to refactor most tests to work with all configurations. I would hope though that we can find a reformulation of a lot of the tests that make them more focused on the actual fact they are challenging and better readable than they would be with a habitual shifting of numbers.
Could we just grab max-eid at the beginning of the tests and use this as the offset?
This is approximately how I did it for the attribute reference tests. Some of the tests used specific input datoms though in addition to the expected output datoms, so the input had to be manipulated as well. This lead me to write some helper functions to shift collections of datoms according to the amount of the system entity datoms. So far, there are also separate functions to shift the value as well which is needed for attributes with reference value type. Of course, this can be made simpler with a more complex helper function detecting when the value is a reference and has to be shifted.
In some instances there might not be a way around using functions to shift the entity IDs to the correct values, but I would say that in the end, a database user will not care about which ID an entity gets allocated inside the database or if the attributes are saved as references or keywords. This is why I think that, ideally, we should avoid checking those details for high-level functions like
transact
,q
orpull
.I would say that the strategies I used for the attribute reference tests in
test/datahike/test/attribute_refs/
are ok and - with some improvements - can be used similarly to refactor most tests to work with all configurations. I would hope though that we can find a reformulation of a lot of the tests that make them more focused on the actual fact they are challenging and better readable than they would be with a habitual shifting of numbers.
I agree that we should not overfit to specific eids and rewrite tests to not rely on them in general is the right way forward.
Addresses #520
Turns all default values for a database into dynamic vars, so they can be set in
tests.edn
to test specific information.The idea is to use the default configuration for all tests, so each test has to be written only once.
Problematic with this strategy are
:attribute-refs?
because the entity IDs are shifted and depending on the function used either keyword or eids are returned, and:schema-on-write
because it needs a schema to be transacted, which means that here entity IDs are also shiftedFor these cases, we would need some kind of translation, similar to the way it is done in the test/datahike/test/attribute-refs test namespaces. If possible, I would like to get rid of these test namespaces as they only contain a subset of the tests and are cumbersome to maintain. Also, we would need to rewrite a lot of the tests to not check for the entity id but for example run queries as it is already done in some updated tests
Unproblematic, however, are config options such like
:store-backend
and:keep-history?
Please, tell me if this strategy makes sense or if there is some problem with it that I have overlooked.
I am also interested to hear which config options you think should be tested, or which combinations you think would make sense.