Open aditbhartia opened 6 days ago
PR #2027 will fix the issue, by checking if the Iterable
is also a Collection
and if not (so no size() method by contract), it will treat it the same as an unknown iterator.
I don't think anything should be changed here. At most maybe adding some more clarifying documentation. There is no way to know whether a data provider is expensive or not.
There are multiple ways to care for this from user-side.
The user can give in an Iteratror
as they are only one-time consumable.
Or they can make sure that the iterable caches the expensive operation.
The only way would be a white-list of things that are definitely safe and non-expensive and that would take away much flexibility from Spock users.
Describe the bug
Hi,
I've been using Spock's data driven testing features to pass in a custom Iterable that requests files from an external DB and iterates over each file returned. I'm using the custom Iterable approach documented here as opposed to getting all files initially and using the results to avoid keeping all input data across test cases in memory. While the code works, it makes extra calls to the DB that aren't required to execute the test. These calls extend test runtime and make unnecessary requests to the data store on every execution. However, if I call
.iterator
on the Iterable, and pass that in as the data provider, the test works as expected, making only the necessary calls to execute the test.After digging through the code, I found that Spock has this logic that estimates the number of iterations on an Iterable, but does not do so for the Iterator. This is the only reason I could find why the custom Iterator is being constructed multiple times and the unnecessary calls are being made.
This is confusing to the user since the Iterator is being run through multiple times without any indication to the user or documentation why. In my opinion, we should make this behavior more clear to the end user and/or provide a way to specify not to estimate the number of iterations, since it may lead to requesting data multiple times in a way that's not transparent to the user. I've attached a simplified example.
While passing in an iterator worked for me, it's non-intuitive and required going through the Spock code. Other users might also face this issue in the future and more documentation combined with a way to turn this feature off would be extremely helpful. Thanks!
To Reproduce
The first test case prints:
the first test case with the second Iterator constructor prints:
and the second test case (with the first Iterator constructor) prints:
Expected behavior
I expect that when passing in an object implementing the Iterable contract as a data provider, Spock only constructs a single Iterator and requests only the data necessary to complete the execution by default. The last output (where the Iterator is run through only once) should be what happens in every scenario.
Actual behavior
When passing in an Iterable as a data provider, the framework constructs the Iterator 3 times and fetches the items from the Iterator 3 times instead of once. In my use case, this led to making 3x the amount of requests required and a slower test runtime.
Java version
Java 17
Buildtool version
Gradle 8.10.2
What operating system are you using
Mac
Dependencies
Additional context
Please let me know if you need any more information. My personal suggestion would be to provide both documentation and a required flag whether the Iterable is safe to call
.size()
on. Custom data providers are one of Spock's biggest value-adds and I think the experience could be made even better with this change.