opsmill / infrahub

Infrahub - A new approach to Infrastructure Management
https://opsmill.com/
GNU Affero General Public License v3.0
186 stars 8 forks source link

bug: neo4j rasises neo4j.exceptions.DatabaseError exception when storing a large GraphQL query #4399

Open wvandeun opened 1 day ago

wvandeun commented 1 day ago

Component

API Server / GraphQL

Infrahub version

0.16.0

Current Behavior

When you try to store a large GraphQL query in Infrahub (query used for a transformation synced into Infrahub, or a user create CoreGraphQLQuery object) you get a neo4j.exceptions.DatabaseError exception.

Traceback (most recent call last):
  File \"/usr/local/lib/python3.12/site-packages/graphql/execution/execute.py\", line 530, in await_result
    return_type, field_nodes, info, path, await result
                                          ^^^^^^^^^^^^
  File \"/source/backend/infrahub/graphql/mutations/main.py\", line 68, in mutate
    obj, mutation = await cls.mutate_create(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/source/backend/infrahub/graphql/mutations/graphql_query.py\", line 74, in mutate_create
    obj, result = await super().mutate_create(root=root, info=info, data=data, branch=branch, at=at)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/source/backend/infrahub/graphql/mutations/main.py\", line 159, in mutate_create
    obj = await cls.mutate_create_object(data=data, db=db, branch=branch, at=at)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/source/backend/infrahub/database/__init__.py\", line 406, in wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/source/backend/infrahub/graphql/mutations/main.py\", line 187, in mutate_create_object
    await obj.save(db=dbt)
  File \"/source/backend/infrahub/core/node/__init__.py\", line 456, in save
    await self._create(at=save_at, db=db)
  File \"/source/backend/infrahub/core/node/__init__.py\", line 410, in _create
    await query.execute(db=db)
  File \"/source/backend/infrahub/core/query/__init__.py\", line 540, in execute
    results, metadata = await db.execute_query_with_metadata(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/source/backend/infrahub/database/__init__.py\", line 295, in execute_query_with_metadata
    results = [item async for item in response]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/usr/local/lib/python3.12/site-packages/neo4j/_async/work/result.py\", line 378, in __aiter__
    await self._connection.fetch_message()
  File \"/usr/local/lib/python3.12/site-packages/neo4j/_async/io/_common.py\", line 188, in inner
    await coroutine_func(*args, **kwargs)
  File \"/usr/local/lib/python3.12/site-packages/neo4j/_async/io/_bolt.py\", line 860, in fetch_message
    res = await self._process_message(tag, fields)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/usr/local/lib/python3.12/site-packages/neo4j/_async/io/_bolt5.py\", line 370, in _process_message
    await response.on_failure(summary_metadata or {})
  File \"/usr/local/lib/python3.12/site-packages/neo4j/_async/io/_common.py\", line 245, in on_failure
    raise Neo4jError.hydrate(**metadata)

neo4j.exceptions.DatabaseError: {code: Neo.DatabaseError.Statement.ExecutionFailed} {message: Property value is too large to index, please see index documentation for limitations. Index: Index( id=7, name='node_range_attr_value_value', type='RANGE', schema=(:AttributeValue {value}), indexProvider='range-1.0' ), entity id: 40115, property size: 13876.}"}

Expected Behavior

The CoreGraphQLQuery object gets created, or the user gets a proper error message.

Steps to Reproduce

Additional Information

The error message seems to be related to the fact we want to index the "query" attribute value. Not sure if that makes a lot of sense in the context of a GraphQLQuery.

ogenstad commented 1 day ago

The problem is with this index:

SHOW INDEXES where name = "node_range_attr_value_value";

╒═══╤═════════════════════════════╤════════╤═════════════════╤═══════╤══════════╤══════════════════╤══════════╤═════════════╤════════════════╤════════╤═════════╕
│id │name                         │state   │populationPercent│type   │entityType│labelsOrTypes     │properties│indexProvider│owningConstraint│lastRead│readCount│
╞═══╪═════════════════════════════╪════════╪═════════════════╪═══════╪══════════╪══════════════════╪══════════╪═════════════╪════════════════╪════════╪═════════╡
│7  │"node_range_attr_value_value"│"ONLINE"│100.0            │"RANGE"│"NODE"    │["AttributeValue"]│["value"] │"range-1.0"  │null            │null    │0        │
└───┴─────────────────────────────┴────────┴─────────────────┴───────┴──────────┴──────────────────┴──────────┴─────────────┴────────────────┴────────┴─────────┘

This doc is for an earlier version of neo4j, but seems relevant: https://neo4j.com/developer/kb/index-limitations-and-workaround/

The native-btree-1.0 index provider has a key size limit of 8167 bytes.

While we're not using the same exact index provider it seems to have the same size limit.

A quick workaround can be to remove the index from the database, but there will probably be some performance penalty to this.

DROP index node_range_attr_value_value;

The problem is not related to the GraphQL query objects themselves, instead it's the value of any field.

If we look at the definition of these objects as defined below a solution might be to set a size limit for the kinds Text and others, and then use something other than the AttributeValue label for other field types such as TextArea and JSON such as LargeAttributeValue where the large one is unindexed. Though I don't yet know the impact this will have on the query engine.

Alternatively we reconsider the use of this index all together.

        {
            "name": "GraphQLQuery",
            "namespace": "Core",
            "description": "A pre-defined GraphQL Query",
            "include_in_menu": False,
            "icon": "mdi:graphql",
            "label": "GraphQL Query",
            "default_filter": "name__value",
            "order_by": ["name__value"],
            "display_labels": ["name__value"],
            "generate_profile": False,
            "branch": BranchSupportType.AWARE.value,
            "uniqueness_constraints": [["name__value"]],
            "documentation": "/topics/graphql",
            "attributes": [
                {"name": "name", "kind": "Text", "unique": True},
                {"name": "description", "kind": "Text", "optional": True},
                {"name": "query", "kind": "TextArea"},
                {
                    "name": "variables",
                    "kind": "JSON",
                    "description": "variables in use in the query",
                    "optional": True,
                    "read_only": True,
                },
dgarros commented 1 day ago

Great summary @ogenstad

I'm leaning toward this solution

use something other than the AttributeValue label for other field types such as TextArea and JSON such as LargeAttributeValue where the large one is unindexed.