Closed ErikTheBerik closed 1 week ago
You can directly modify _schema
collection likes https://trino.io/docs/current/connector/mongodb.html#table-definition when the schema inference doesn't work well.
I would recommend writing a small script for translating the Typebox definitions into MongoDB connector definitions.
Hello @ebyhr,
Thank you for your help with this! I have read the documentation you referenced on Trino's MongoDB type mapping and am trying to adjust the mapping for ObjectId
fields to be read directly as VARCHAR
. As suggested, I'd like the _id
field (and other ObjectId
s) to render as readable values without needing a cast to VARCHAR
, making it more compatible with Superset, which I’m using for data visualization.
I’m integrating Superset to visualize data from Trino, and here’s how it reads the data types:
userId: ObjectId('652c4a258c13925a1e97572d')
65 2c 4a 25 8c 13 92 5a 1e 97 57 2b
ZSxKJYwTkloel1ct
I’ve managed to get it working by manually updating the schema:
db.getCollection("_schema").updateMany(
{ "fields.type": { $in: ["ObjectId", "array(ObjectId)"] } },
{ $set: { "fields.$[elem].type": "varchar" } },
{ arrayFilters: [{ "elem.type": { $in: ["ObjectId", "array(ObjectId)"] } }] }
);
This works fine, but is there an automated way to apply this conversion? I tried adding the parameter mongodb.object-id-as-varchar=true
in mongodb.properties
, but it seems invalid.
is there an automated way to apply this conversion?
Nope.
I am working with Trino to query data from a MongoDB database, and I’ve encountered some challenges around schema handling.
Issue
The documents in my MongoDB collections are complex, with many possible document types within a collection. Currently, Trino appears to generate schemas by inspecting the first document it encounters. This approach results in incomplete or incorrect schemas because Trino.
I already have schema definitions for these collections written in TypeScript, using Typebox. I can convert these Typebox definitions into JSON Schemas, but currently, from what I've found, there doesn’t seem to be a way to feed these JSON Schemas into Trinos MongoDB connector to ensure consistent schema interpretation.
Potential Feature Request
Would it be possible to allow user-defined JSON Schemas to be applied to MongoDB collections within Trino? Or is there an existing approach to the problem I'm facing? Ideally, we could supply a JSON Schema configuration that Trino would use to parse each MongoDB collection, bypassing the automatic schema inference or enhancing it. This feature would greatly simplify working with diverse document structures and eliminate the need for extensive manual schema correction.