spockframework / spock

The Enterprise-ready testing and specification framework.
https://spockframework.org
Apache License 2.0
3.54k stars 467 forks source link

Passing in a custom Iterable as a data provider causes Spock to iterate over each item multiple times #2022

Open aditbhartia opened 6 days ago

aditbhartia commented 6 days ago

Describe the bug

Hi,

I've been using Spock's data driven testing features to pass in a custom Iterable that requests files from an external DB and iterates over each file returned. I'm using the custom Iterable approach documented here as opposed to getting all files initially and using the results to avoid keeping all input data across test cases in memory. While the code works, it makes extra calls to the DB that aren't required to execute the test. These calls extend test runtime and make unnecessary requests to the data store on every execution. However, if I call .iterator on the Iterable, and pass that in as the data provider, the test works as expected, making only the necessary calls to execute the test.

After digging through the code, I found that Spock has this logic that estimates the number of iterations on an Iterable, but does not do so for the Iterator. This is the only reason I could find why the custom Iterator is being constructed multiple times and the unnecessary calls are being made.

This is confusing to the user since the Iterator is being run through multiple times without any indication to the user or documentation why. In my opinion, we should make this behavior more clear to the end user and/or provide a way to specify not to estimate the number of iterations, since it may lead to requesting data multiple times in a way that's not transparent to the user. I've attached a simplified example.

While passing in an iterator worked for me, it's non-intuitive and required going through the Spock code. Other users might also face this issue in the future and more documentation combined with a way to turn this feature off would be extremely helpful. Thanks!

To Reproduce

package org.spockframework.util

import spock.lang.Specification

class IterableRunTwiceSpec extends Specification {

  // calls iterator constructor multiple times
  def "iterable_run_twice"(String data) {

    expect:
    data == "item from DB"

    where:
    data << new ExpensiveIterable();
  }

  // only makes the minimum number of queries required
  def "iterable_as_iterator_run_once"() {

    expect:
    data == "item from DB"

    where:
    data << new ExpensiveIterable().iterator()
  }
}

class DataStore {

  int queriesMadeToDB = 0;

  DataStore() {
    System.out.println("Making connection to data store")
  }

  String getNext() {
    System.out.println("fetching item " + queriesMadeToDB++ + " from DB");
    return "item from DB";
  }
}

class ExpensiveIterable implements Iterable<String> {

  DataStore dataStore;

  ExpensiveIterable() {
    System.out.println("Constructing expensive iterable");
    dataStore = new DataStore();
  }

  @Override
  Iterator<String> iterator() {
    return new ExpensiveIterator(dataStore);
  }

  // get first 5 items from DB, only makes request to DB when next item required
  class ExpensiveIterator implements Iterator<String> {

    int queries = 0;
    DataStore dataStore;

    // Makes calls to same data store repeatedly
    ExpensiveIterator(final DataStore dataStore) {
      this.dataStore = dataStore;
      System.out.println("Constructing expensive iterator");
    }

    // Other option: constructs connection to data store multiple times (expensive when constructor performs some work)
    ExpensiveIterator() {
      dataStore = new DataStore();
      System.out.println("Constructing expensive iterator")
    }

    @Override
    boolean hasNext() {
      return queries != 5;
    }

    @Override
    String next() {
      queries = queries + 1
      return dataStore.next;
    }
  }
}

The first test case prints:

Constructing expensive iterable
Making connection to data store
Constructing expensive iterator
fetching item 0 from DB
fetching item 1 from DB
fetching item 2 from DB
fetching item 3 from DB
fetching item 4 from DB
Constructing expensive iterator
fetching item 5 from DB
fetching item 6 from DB
fetching item 7 from DB
fetching item 8 from DB
fetching item 9 from DB
Constructing expensive iterator
fetching item 10 from DB
fetching item 11 from DB
fetching item 12 from DB
fetching item 13 from DB
fetching item 14 from DB

the first test case with the second Iterator constructor prints:

Constructing expensive iterable
Making connection to data store
Making connection to data store
Constructing expensive iterator
fetching item 0 from DB
fetching item 1 from DB
fetching item 2 from DB
fetching item 3 from DB
fetching item 4 from DB
Making connection to data store
Constructing expensive iterator
fetching item 0 from DB
fetching item 1 from DB
fetching item 2 from DB
fetching item 3 from DB
fetching item 4 from DB
Making connection to data store
Constructing expensive iterator
fetching item 0 from DB
fetching item 1 from DB
fetching item 2 from DB
fetching item 3 from DB
fetching item 4 from DB

and the second test case (with the first Iterator constructor) prints:

Constructing expensive iterable
Making connection to data store
Constructing expensive iterator
fetching item 0 from DB
fetching item 1 from DB
fetching item 2 from DB
fetching item 3 from DB
fetching item 4 from DB

Expected behavior

I expect that when passing in an object implementing the Iterable contract as a data provider, Spock only constructs a single Iterator and requests only the data necessary to complete the execution by default. The last output (where the Iterator is run through only once) should be what happens in every scenario.

Actual behavior

When passing in an Iterable as a data provider, the framework constructs the Iterator 3 times and fetches the items from the Iterator 3 times instead of once. In my use case, this led to making 3x the amount of requests required and a slower test runtime.

Java version

Java 17

Buildtool version

Gradle 8.10.2

What operating system are you using

Mac

Dependencies

__$$asciidoctorj$$___d (n)
+--- org.asciidoctor:asciidoctorj:2.5.7 (n)
\--- org.asciidoctor:asciidoctorj-diagram:2.2.3 (n)

__$$asciidoctorj$$___r
+--- org.asciidoctor:asciidoctorj:2.5.7 -> 2.5.13
|    +--- com.beust:jcommander:1.82
|    +--- org.asciidoctor:asciidoctorj-api:2.5.13
|    \--- org.jruby:jruby:9.4.7.0 -> org.jruby:jruby-complete:9.4.7.0
\--- org.asciidoctor:asciidoctorj-diagram:2.2.3
     +--- org.asciidoctor:asciidoctorj-diagram-plantuml:1.2022.5
     \--- org.asciidoctor:asciidoctorj-diagram-ditaamini:1.0.3

asciidoctorExtensions
\--- spockbuild:asciidoc-extensions -> project :build-logic:asciidoc-extensions

checkstyle - The Checkstyle libraries to be used for this project.
+--- com.puppycrawl.tools:checkstyle:9.3
|    +--- info.picocli:picocli:4.6.2
|    +--- org.antlr:antlr4-runtime:4.9.3
|    +--- commons-beanutils:commons-beanutils:1.9.4
|    |    \--- commons-collections:commons-collections:3.2.2
|    +--- com.google.guava:guava:31.0.1-jre
|    |    +--- com.google.guava:failureaccess:1.0.1
|    |    +--- com.google.guava:listenablefuture:9999.0-empty-to-avoid-conflict-with-guava
|    |    +--- com.google.code.findbugs:jsr305:3.0.2
|    |    +--- org.checkerframework:checker-qual:3.12.0
|    |    +--- com.google.errorprone:error_prone_annotations:2.7.1
|    |    \--- com.google.j2objc:j2objc-annotations:1.3
|    +--- org.reflections:reflections:0.10.2
|    |    +--- org.javassist:javassist:3.28.0-GA
|    |    \--- com.google.code.findbugs:jsr305:3.0.2
|    \--- net.sf.saxon:Saxon-HE:10.6
\--- io.spring.nohttp:nohttp-checkstyle:0.0.11
     +--- com.puppycrawl.tools:checkstyle:8.33 -> 9.3 (*)
     \--- io.spring.nohttp:nohttp:0.0.11
          \--- ch.qos.logback:logback-classic:1.2.3
               \--- ch.qos.logback:logback-core:1.2.3

jacocoAgent - The Jacoco agent to use to get coverage data.
\--- org.jacoco:org.jacoco.agent:0.8.12

jacocoAnt - The Jacoco ant tasks to use to get execute Gradle tasks.
\--- org.jacoco:org.jacoco.ant:0.8.12
     +--- org.jacoco:org.jacoco.core:0.8.12
     |    +--- org.ow2.asm:asm:9.7
     |    +--- org.ow2.asm:asm-commons:9.7
     |    |    +--- org.ow2.asm:asm:9.7
     |    |    \--- org.ow2.asm:asm-tree:9.7
     |    |         \--- org.ow2.asm:asm:9.7
     |    \--- org.ow2.asm:asm-tree:9.7 (*)
     +--- org.jacoco:org.jacoco.report:0.8.12
     |    \--- org.jacoco:org.jacoco.core:0.8.12 (*)
     \--- org.jacoco:org.jacoco.agent:0.8.12

nohttp
\--- io.spring.nohttp:nohttp-checkstyle:0.0.11
     +--- com.puppycrawl.tools:checkstyle:8.33
     |    +--- info.picocli:picocli:4.3.1
     |    +--- antlr:antlr:2.7.7
     |    +--- org.antlr:antlr4-runtime:4.8-1
     |    +--- commons-beanutils:commons-beanutils:1.9.4
     |    |    +--- commons-logging:commons-logging:1.2
     |    |    \--- commons-collections:commons-collections:3.2.2
     |    +--- com.google.guava:guava:29.0-jre
     |    |    +--- com.google.guava:failureaccess:1.0.1
     |    |    +--- com.google.guava:listenablefuture:9999.0-empty-to-avoid-conflict-with-guava
     |    |    +--- com.google.code.findbugs:jsr305:3.0.2
     |    |    +--- org.checkerframework:checker-qual:2.11.1
     |    |    +--- com.google.errorprone:error_prone_annotations:2.3.4
     |    |    \--- com.google.j2objc:j2objc-annotations:1.3
     |    \--- net.sf.saxon:Saxon-HE:9.9.1-7
     \--- io.spring.nohttp:nohttp:0.0.11
          +--- ch.qos.logback:logback-classic:1.2.3
          |    +--- ch.qos.logback:logback-core:1.2.3
          |    \--- org.slf4j:slf4j-api:1.7.25 -> 1.7.26
          \--- org.slf4j:slf4j-api:1.7.26

nohttp-cli
\--- io.spring.nohttp:nohttp-cli:0.0.11
     +--- info.picocli:picocli:3.9.5
     \--- io.spring.nohttp:nohttp:0.0.11
          +--- ch.qos.logback:logback-classic:1.2.3
          |    +--- ch.qos.logback:logback-core:1.2.3
          |    \--- org.slf4j:slf4j-api:1.7.25 -> 1.7.26
          \--- org.slf4j:slf4j-api:1.7.26

Additional context

Please let me know if you need any more information. My personal suggestion would be to provide both documentation and a required flag whether the Iterable is safe to call .size() on. Custom data providers are one of Spock's biggest value-adds and I think the experience could be made even better with this change.

AndreasTu commented 2 days ago

PR #2027 will fix the issue, by checking if the Iterable is also a Collection and if not (so no size() method by contract), it will treat it the same as an unknown iterator.

Vampire commented 2 days ago

I don't think anything should be changed here. At most maybe adding some more clarifying documentation. There is no way to know whether a data provider is expensive or not.

There are multiple ways to care for this from user-side. The user can give in an Iteratror as they are only one-time consumable. Or they can make sure that the iterable caches the expensive operation.

The only way would be a white-list of things that are definitely safe and non-expensive and that would take away much flexibility from Spock users.