stargate / data-api

JSON document API for Apache Cassandra (formerly known as JSON API)
https://stargate.io
Apache License 2.0
14 stars 16 forks source link

Return the schema of the projection for read commands #1584

Open amorton opened 3 days ago

amorton commented 3 days ago

Per the spec, return the schema in the status using the same format as we do for inserts.

Example from insert response below:

{
    "status": {
        "insertedIds": [
            [
                "aaron.morton@datastax.com"
            ],
            [
                "alice@datastax.com"
            ]
        ],
        "primaryKeySchema": {
            "email": {
                "type": "text"
            }
        }
    }
}

The pattern started with the insert was to have a status key specific for reads and insert, so call it ?? `projectionSchema' ??

amorton commented 3 days ago

Work in progress

given this table

CREATE TABLE demo.users (
    email text PRIMARY KEY,
    age tinyint,
    country text,
    human boolean,
    name text
)

This read

{
  "findOne": {
      "filter" : {"email" : "henry.white@awesome.api"},
      "projection" : {} 
}}

returns

{
    "data": {
        "document": {
            "country": "NZ",
            "name": "Henry White",
            "human": true,
            "email": "henry.white@awesome.api",
            "age": 40
        }
    },
    "status": {
        "projectionSchema": {
            "country": {
                "type": "text"
            },
            "name": {
                "type": "text"
            },
            "human": {
                "type": "boolean"
            },
            "email": {
                "type": "text"
            },
            "age": {
                "type": "tinyint"
            }
        }
    }
}
amorton commented 3 days ago

Another example response:

NOTE: still returning nulls

Body:
{
    "find": {
        "filter": {

        },
        "projection": {
            "id": 1
        }
    }
}

HTTP/1.1 200 OK
content-length: 1180
Content-Type: application/json;charset=UTF-8

{
    "data": {
        "documents": [
            {
                "id": "row0"
            },
            {
                "id": "row1"
            },
            {
                "id": "row11"
            },
            {
                "id": "row14"
            },
            {
                "id": "row2"
            },
            {
                "id": "row12"
            },
            {
                "id": "row7"
            },
            {
                "id": "row8"
            },
            {
                "id": "row6"
            },
            {
                "id": "row10"
            },
            {
                "id": "row4"
            },
            {
                "id": "row5"
            },
            {
                "id": "row16"
            },
            {
                "id": "row-1"
            },
            {
                "id": "row15"
            },
            {
                "id": "row9"
            },
            {
                "id": "row13"
            },
            {
                "id": "row17"
            },
            {
                "id": "row3"
            }
        ],
        "nextPageState": null
    },
    "status": {
        "projectionSchema": {
            "id": {
                "type": "text"
            }
        },
        "warnings": [
            {
                "errorCode": "ZERO_FILTER_OPERATIONS",
                "message": "Zero filters were provided in the filer for this query. \n\nProviding zero filters will return all rows in the table, which may have poor performance when the table is large. For the best performance, include one or more filters using the primary key or indexes.\n\nThe table \"kslh38IyDxPibhxz1I\".\"projectionSchemaTable\" has the primary key: id(text).\nAnd has indexes on the columns: [None].\n\nThe query was executed without taking advantage of the primary key or indexes on the table, this can have performance implications on large tables.\n\nSee documentation at XXXX for best practices for filtering.",
                "family": "REQUEST",
                "scope": "WARNING",
                "title": "Zero operations provided in query filter",
                "id": "b001cc82-5227-46a5-af1d-eff030294d89"
            }
        ]
    }
}

and

Body:
{
    "find": {
        "filter": {

        },
        "projection": {
            "col_int": 1,
            "col_duration": 1,
            "col_text": 1
        }
    }
}

HTTP/1.1 200 OK
content-length: 2068
Content-Type: application/json;charset=UTF-8

{
    "data": {
        "documents": [
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": null
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": null,
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": null,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            },
            {
                "col_int": 1,
                "col_duration": "89h8m53s",
                "col_text": "text"
            }
        ],
        "nextPageState": null
    },
    "status": {
        "projectionSchema": {
            "col_int": {
                "type": "int"
            },
            "col_duration": {
                "type": "duration"
            },
            "col_text": {
                "type": "text"
            }
        },
        "warnings": [
            {
                "errorCode": "ZERO_FILTER_OPERATIONS",
                "message": "Zero filters were provided in the filer for this query. \n\nProviding zero filters will return all rows in the table, which may have poor performance when the table is large. For the best performance, include one or more filters using the primary key or indexes.\n\nThe table \"kslh38IyDxPibhxz1I\".\"projectionSchemaTable\" has the primary key: id(text).\nAnd has indexes on the columns: [None].\n\nThe query was executed without taking advantage of the primary key or indexes on the table, this can have performance implications on large tables.\n\nSee documentation at XXXX for best practices for filtering.",
                "family": "REQUEST",
                "scope": "WARNING",
                "title": "Zero operations provided in query filter",
                "id": "a4fa1755-7f93-422f-9aaf-ceab60c3e98b"
            }
        ]
    }
}
amorton commented 2 days ago

Hit a bump that stops finishing, need to sync with @tatu-at-datastax and @maheshrajamani

my code is in this branch https://github.com/stargate/data-api/tree/refs/heads/ajm/schema-for-read

When we rebuild the schema from the driver result set I call this on ApiDataTypeDefs

  public static Optional<ApiDataTypeDef> from(DataType dataType) {
    return Optional.ofNullable(PRIMITIVE_TYPES_BY_CQL_TYPE.get(dataType));
  }

it is failing for the IT's the test with collection types:

this table

    assertNamespaceCommand(keyspaceName)
        .templated()
        .createTable(
            TABLE_WITH_LIST_COLUMNS,
            Map.of(
                "id",
                "text",
                "stringList",
                Map.of("type", "list", "valueType", "text"),
                "intList",
                Map.of("type", "list", "valueType", "int"),
                "doubleList",
                Map.of("type", "list", "valueType", "double")),
            "id")
        .wasSuccessful();

when I read from it my code in ReadAttempt builds the schema like this:

    var apiColumns =
        new OrderedApiColumnDefContainer(readResult.resultSet.getColumnDefinitions().size());
    for (var columnDef : readResult.resultSet.getColumnDefinitions()) {
      try {
        apiColumns.put(ApiColumnDef.from(columnDef.getName(), columnDef.getType()));
      } catch (UnsupportedCqlTypeForDML e) {
        throw ServerException.Code.UNEXPECTED_SERVER_ERROR.get(errVars(e));
      }
    }

and I end up with this error

Unsupported column type: List(TEXT, not frozen) for column: "stringList"
tatu-at-datastax commented 2 days ago

I think that if this:

  public static Optional<ApiDataTypeDef> from(DataType dataType) {
    return Optional.ofNullable(PRIMITIVE_TYPES_BY_CQL_TYPE.get(dataType));
  }

is called, it will fail for non-primitive/non-scalar types like Lists, Sets, Maps, Vectors as they are handled separately.

amorton commented 2 days ago

yup, spoke to @maheshrajamani and there is some existing code we can refactor and improve that function to be the one stop shop for CQL Type -> Api Type