Concept of generating cypher from GraphQL query seem to be wrong if using custom Data Fetchers

Krejtcha commented 2 years ago

Describe the bug

Right now the library generates the cypher query based on the provided GraphQL query only fetching the attributes that were requested from neo4j. This might seem like a good idea when it comes to performance (why fetch the whole node if just a couple of attributes were requested). The problem occurs when using additional custom Data Fetchers. These Data Fetchers can be used to run custom cypher queries that would not be supported by the @cypher directive or running any computation code, accessing other databases ... but in order to do that they need context. This context should be provided by the env.getSource() method where all the parent data should be present. In most cases you will need the id of the parent to be able to do any additional computation. But in this case the parent contains only the data that were requested from neo4j which is wrong.

In the example below the javaData.name relies on the title and if this is not requested it will not be in the parent thus javaData.name will have the value "test null" which is wrong.

Test Case

Modify the content of the class: https://github.com/neo4j-graphql/neo4j-graphql-java/blob/master/examples/dgs-spring-boot/src/test/kotlin/org/neo4j/graphql/examples/dgsspringboot/datafetcher/AdditionalDataFetcherTest.kt to the following code:

package org.neo4j.graphql.examples.dgsspringboot.datafetcher

import com.jayway.jsonpath.TypeRef
import com.netflix.graphql.dgs.DgsQueryExecutor
import com.netflix.graphql.dgs.client.codegen.GraphQLQueryRequest
import org.assertj.core.api.Assertions
import org.junit.jupiter.api.AfterEach
import org.junit.jupiter.api.BeforeEach
import org.junit.jupiter.api.Test
import org.neo4j.driver.Driver
import org.neo4j.driver.springframework.boot.autoconfigure.Neo4jDriverProperties
import org.neo4j.graphql.examples.dgsspringboot.types.DgsConstants
import org.neo4j.graphql.examples.dgsspringboot.types.client.MoviesGraphQLQuery
import org.neo4j.graphql.examples.dgsspringboot.types.client.MoviesProjectionRoot
import org.neo4j.graphql.examples.dgsspringboot.types.types.Movie
import org.neo4j.graphql.examples.dgsspringboot.types.types.MovieOptions
import org.neo4j.graphql.examples.dgsspringboot.types.types.MovieSort
import org.neo4j.graphql.examples.dgsspringboot.types.types.SortDirection
import org.skyscreamer.jsonassert.JSONAssert
import org.springframework.beans.factory.annotation.Autowired
import org.springframework.boot.autoconfigure.EnableAutoConfiguration
import org.springframework.boot.context.properties.ConfigurationProperties
import org.springframework.boot.test.context.SpringBootTest
import org.springframework.boot.test.context.TestConfiguration
import org.springframework.context.annotation.Bean
import org.springframework.context.annotation.Primary
import org.testcontainers.containers.Neo4jContainer
import org.testcontainers.junit.jupiter.Container
import org.testcontainers.junit.jupiter.Testcontainers
import java.net.URI

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.NONE, properties = ["database=neo4j"])
@EnableAutoConfiguration
@Testcontainers
internal class AdditionalDataFetcherTest(
        @Autowired
        val dgsQueryExecutor: DgsQueryExecutor,
        @Autowired
        val driver: Driver
) {

    @BeforeEach
    fun setup() {
        driver.session().use {
            it.run("""
            CREATE (:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
            CREATE (:Movie {title:'The Matrix Reloaded', released:2003, tagline:'Free your mind'})
            CREATE (:Movie {title:'The Matrix Revolutions', released:2003, tagline:'Everything that has a beginning has an end'})
            """.trimIndent())
        }
    }

    @AfterEach
    fun tearDown() {
        driver.session().use { it.run("MATCH (n) DETACH DELETE n") }
    }

    @Test
    fun testHybridDataFetcher() {

        val graphQLQueryRequest = GraphQLQueryRequest(
                MoviesGraphQLQuery.newRequest()
                    .options(MovieOptions(sort = listOf(MovieSort(title = SortDirection.DESC))))
                    .build(),
                MoviesProjectionRoot().also { movie ->
//                    movie.title() // do not query for title
                    movie.bar()
                    movie.javaData().also { javaData ->
                        javaData.name()
                    }
                }
        )

        val request = graphQLQueryRequest.serialize()
//        Assertions.assertThat(request).isEqualTo("query {movies(options: {sort:[{title:DESC }] }){ title bar javaData   { name } } }")
        Assertions.assertThat(request).isEqualTo("query {movies(options: {sort:[{title:DESC }] }){ bar javaData   { name } } }")

        val response = dgsQueryExecutor.executeAndGetDocumentContext(request)

        //language=JSON
//        JSONAssert.assertEquals("""
//        {
//          "data": {
//            "movies": [
//              {
//                "title": "The Matrix Revolutions",
//                "bar": "foo",
//                "javaData": [
//                  {
//                    "name": "test The Matrix Revolutions"
//                  }
//                ]
//              },
//              {
//                "title": "The Matrix Reloaded",
//                "bar": "foo",
//                "javaData": [
//                  {
//                    "name": "test The Matrix Reloaded"
//                  }
//                ]
//              },
//              {
//                "title": "The Matrix",
//                "bar": "foo",
//                "javaData": [
//                  {
//                    "name": "test The Matrix"
//                  }
//                ]
//              }
//            ]
//          }
//        }
//      """.trimIndent(), response.jsonString(), true)
        // since title was not queried it was not present in the parent while the additional data fetcher for javaData was executed
        // therefore javaData.name will be: "test null"
        //language=JSON
        JSONAssert.assertEquals("""
        {
          "data": {
            "movies": [
              {
                "bar": "foo",
                "javaData": [
                  {
                    "name": "test The Matrix Revolutions"
                  }
                ]
              },
              {
                "bar": "foo",
                "javaData": [
                  {
                    "name": "test The Matrix Reloaded"
                  }
                ]
              },
              {
                "bar": "foo",
                "javaData": [
                  {
                    "name": "test The Matrix"
                  }
                ]
              }
            ]
          }
        }
      """.trimIndent(), response.jsonString(), true)

        val list = response.read("data.${DgsConstants.QUERY.Movies}", object : TypeRef<List<Movie>>() {})
        Assertions.assertThat(list).hasSize(3)
    }

    @TestConfiguration
    open class Config {

        @Bean
        @ConfigurationProperties(prefix = "ignore")
        @Primary
        open fun properties(): Neo4jDriverProperties {
            val properties = Neo4jDriverProperties()
            properties.uri = URI.create(neo4jServer.boltUrl)
            properties.authentication = Neo4jDriverProperties.Authentication()
            properties.authentication.username = "neo4j"
            properties.authentication.password = neo4jServer.adminPassword
            return properties
        }
    }

    companion object {
        @Container
        private val neo4jServer = Neo4jContainer<Nothing>("neo4j:4.4.1")
    }
}

and run the test.

Additional context

I believe that it should be configurable for each GraphQL type do define what data will be always fetched from neo4j to provide sufficient context for additional custom Data Fetchers. This way the necessary data will be always fetched and the additional custom Data Fetchers will be satisfied and will be able to do their computation and the rest of the data will be fetched only if requested by the QraphQL query for better performance. The graphql-java library should then remove data that was not requested from the result returning only that the user asked for with the benefit that the additional custom Data Fetchers were able to do their computation and return the correct result.

Note that this modification will break the Translator approach where there is no postprocessing of the fetched data thus the result will contain also the data needed for satisfaction of the additional custom Data Fetchers and not only what the user asked for.

Krejtcha commented 2 years ago

Hi, just wanted to ask if somebody had a look on this issue? I can answer your questions if you have any.

Andy2003 commented 2 years ago

The example you provided is an edge case that shows the limits of the chosen approach. I think, to get the example working, you have to collect all the required information by yourself, if they are not provided in the context.

Andy2003 commented 2 years ago

I believe that it should be configurable for each GraphQL type do define what data will be always fetched from neo4j to provide sufficient context for additional custom Data Fetchers.

This is nothing we will add in the near future. The current focus is on harmonizing the API with the one from the Javascript version.

Krejtcha commented 2 years ago

I understand that this is no priority, but I do not think it is an edge case and I would like to understand what you mean by "you have to collect all the required information by yourself, if they are not provided in the context". When using custom data fetchers you fetch the data based on the parent context (you need at least the id in order to be able to calculate something), but here the parent context is created by this library which creates it based on the GraphQL query provided by the user. This means that if the user does not query for the needed context, the cypher provided by the library will not fetch the context either and the custom data fetcher therefore does not have any context to work with. I do not know how I should collect the required context by myself. The only thing I can think of is to augment the user provided GraphQL query to contain the fields needed for the context (and later remove those fields from the result) or to augment the cypher generated by the library to include the fields needed for the context (which I do not think is the way to go). Maybe I am missing something, could you explain how you meant it, how do I collect the needed context by myself?

Andy2003 commented 2 years ago

There are two use cases:

You query a top level field (with custom resolver) and want to use some data from neo4j to compute the result. Here you need query everything by your own. (this was the use case I was thinking of above)
You want to extend an augmented type (neo4j node) with some custom field resolver. This is the use case, you were referring to. And your are totally right, in case the client is not querying the relevant fields you may not be able to handle the node correctly. I think to solve this issue, we need a new directive (something like @context, feel free to suggest a better name)

Andy2003 commented 2 years ago

In the Javascript-API a new @computed directive is used to solve this kind of issue. So this will be fixed after the API-Alignement is implemented (which I'm currently working on).

neo4j-graphql / neo4j-graphql-java

Concept of generating cypher from GraphQL query seem to be wrong if using custom Data Fetchers #268