hydromatic / morel

Standard ML interpreter, with relational extensions, implemented in Java
Apache License 2.0
291 stars 15 forks source link

How to instantiate a Morel context from an existing Calcite `rootSchema`? #145

Closed GavinRay97 closed 2 years ago

GavinRay97 commented 2 years ago

Heya Julian,

I'm working on a web playground for Calcite, that I plan on hosting for free so that people can experiment with it. It'll essentially be a web REPL with some pre-loaded data, and the ability to render visualizations of query plans + schema diagrams.

I figured that while I'm at it, it could be nice to add a second input form, which takes Morel expressions, and is bound to the same Calcite context/rootSchema.

Having a bit of difficulty figuring out how to do this though.

I have a singleton object that holds the master schema like below -- I think I need to pass this as a foreignValue to Morel or similar?

// Holds the master schema that contains all datasource schemas and their sub-schemas
object CalciteSchemaService {

    val connection: CalciteConnection = initCalciteConnection()
    val rootSchema: SchemaPlus = connection.rootSchema

    val frameworkConfig: FrameworkConfig = Frameworks.newConfigBuilder()
        .defaultSchema(connection.rootSchema)
        .parserConfig(SqlParser.config().withCaseSensitive(false))
        .build()

    val relBuilder: RelBuilder = RelBuilder.create(frameworkConfig)

    // <SNIPPED>
}

I see this but it's protected:

https://github.com/hydromatic/morel/blob/0b76feee27343da5e5dba84938035a576b4665d1/src/main/java/net/hydromatic/morel/foreign/Calcite.java#L64-L71

julianhyde commented 2 years ago

Yes, foreignValue is the way to do it. It associates a Calcite schema with a Morel variable. You could have a single variable named ‘calcite’ for the root schema (and all first-level schemas would appear as fields) or create a separate foreignValue for each first-level schema.

You will need some kind of shell. (Parses and executes commands, and prints the results, in the environment created by previous commands in the same session.) Maybe one of the existing shells in Morel will work for you, or could be adapted.

The existing shells are able to read commands from a file, including recursively reading files, then switch back to their original input stream, which might be useful to you.

I briefly looked at creating a Java wrapper so that Morel could run in Jupyter, and it looked like a pretty similar problem.

julianhyde commented 2 years ago

I’m not sure that Calcite’s constructor being protected is a problem. Especially if it is constructed by a shell.

One improvement might be for the list of foreignValue to be dynamic, so you can add and remove after the shell has been created.

GavinRay97 commented 2 years ago

Thanks for the pointers Julian. I think I've gotten closer with this, but getting an error about a table not being found.

My guess would be that I maybe need to pass in a custom Calcite context instead of just the ForeignValue map, but I'm not 100% sure to be honest:

The schema/table structure of my rootSchema is:

Schema: chinook
Tables:
    Schema: information_schema
    Tables: ...
    Schema: pg_catalog
    Tables: ...
    Schema: public
    Tables:
        Album
        Artist
        ...
Schema: metadata
Tables:
    COLUMNS
    TABLES
object CalciteSchemaService {

    init {
        // ...
        addDatabase("chinook", h2DataSource)
    }

    fun addDatabase(databaseName: String, ds: DataSource): SchemaPlus {
        // Inner helper func
        fun getSubSchemas(connection: Connection): List<String> {
            connection.metaData.schemas.use { rs ->
                val schemas = mutableListOf()
                while (rs.next()) {
                    schemas.add(rs.getString("TABLE_SCHEM"))
                }
                return schemas
            }
        }

        if (getSubSchemas(ds.connection).isNotEmpty()) {
            val catalogSchema = JdbcCatalogSchema.create(rootSchema, databaseName, ds, null)
            rootSchema.add(databaseName, catalogSchema)
        } else {
            val jdbcSchema = JdbcSchema.create(rootSchema, databaseName, ds, null, null)
            rootSchema.add(databaseName, jdbcSchema)
        }

        return rootSchema
    }
class CalciteMorelInterpreter {

    static String smlProgram = "chinook;";
    static InputStream input = new ByteArrayInputStream(smlProgram.getBytes());
    static PrintStream output = System.out;

    static SchemaPlus rootSchema = CalciteSchemaService.INSTANCE.getRootSchema();
    static SchemaPlus chinookSchema = rootSchema.getSubSchema("chinook").getSubSchema("public");

    static Map<String, ForeignValue> foreignValueMap = Calcite
            .withDataSets(
                    Map.of("chinook", (Calcite calcite) -> new CalciteForeignValue(calcite, chinookSchema, true))
            )
            .foreignValues();

    public static void main(String[] args) {
        var morelMain = new net.hydromatic.morel.Main(List.of(), input, output, foreignValueMap, new File(""));
        morelMain.run();
    }
}
Exception in thread "main" org.apache.calcite.runtime.CalciteException: Table 'chinook.public.Album' not found
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:505)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:599)
    at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:505)
    at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:599)
    at org.apache.calcite.tools.RelBuilder.scan(RelBuilder.java:1575)
    at org.apache.calcite.tools.RelBuilder.scan(RelBuilder.java:1575)
    at net.hydromatic.morel.foreign.CalciteForeignValue.lambda$value$4(CalciteForeignValue.java:91)
    at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422)
    at com.google.common.collect.RegularImmutableSortedSet.forEach(RegularImmutableSortedSet.java:89)
    at net.hydromatic.morel.foreign.CalciteForeignValue.value(CalciteForeignValue.java:89)
    at net.hydromatic.morel.compile.Environments.lambda$foreignBindings$2(Environments.java:104)
    at com.google.common.collect.SingletonImmutableBiMap.forEach(SingletonImmutableBiMap.java:68)
    at net.hydromatic.morel.foreign.CalciteForeignValue.lambda$value$4(CalciteForeignValue.java:91)
    at net.hydromatic.morel.compile.Environments.foreignBindings(Environments.java:101)
    at net.hydromatic.morel.compile.Environments.env(Environments.java:95)
    at net.hydromatic.morel.compile.Environments.env(Environments.java:63)
    at net.hydromatic.morel.Main.run(Main.java:122)
    at com.github.gavinraydev.CalciteMorelInterpreter.main(Morel.java:33)
    at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422)
    at com.google.common.collect.RegularImmutableSortedSet.forEach(RegularImmutableSortedSet.java:89)
    at net.hydromatic.morel.foreign.CalciteForeignValue.value(CalciteForeignValue.java:89)
    at net.hydromatic.morel.compile.Environments.lambda$foreignBindings$2(Environments.java:104)
    at com.google.common.collect.SingletonImmutableBiMap.forEach(SingletonImmutableBiMap.java:68)
    at net.hydromatic.morel.compile.Environments.foreignBindings(Environments.java:101)
    at net.hydromatic.morel.compile.Environments.env(Environments.java:95)
    at net.hydromatic.morel.compile.Environments.env(Environments.java:63)
    at net.hydromatic.morel.Main.run(Main.java:122)
    at com.github.gavinraydev.CalciteMorelInterpreter.main(Morel.java:33)
GavinRay97 commented 2 years ago

Ah okay, I figured it out. The logic for CalciteForeignValue only handles 1-level of Schema nesting, so you can't do chinook.public, you need to recursively visit and namespace nested schemas like chinook__public:

// Recursively copy sub-schemas from a rootSchema into a Morel Calcite ForeignValue context
private static Calcite createMorelCalciteForeignValueCtx(SchemaPlus rootSchema) {
    boolean toLowerCase = true;
    final ImmutableMap.Builder<String, DataSet> builder = ImmutableMap.builder();

    SchemaPlus currentSchema = rootSchema;
    while (true) {
        final Set<String> subSchemaNames = currentSchema.getSubSchemaNames();
        if (subSchemaNames.isEmpty()) {
            break;
        }
        for (String subSchemaName : subSchemaNames) {
            final SchemaPlus subSchema = currentSchema.getSubSchema(subSchemaName);
            String name = Schemas.path(subSchema).names().stream().reduce((a, b) -> a + "__" + b).get();
            builder.put(name, calcite -> {
                SchemaPlus newSchema = calcite.rootSchema.add(name, subSchema);
                return new CalciteForeignValue(calcite, newSchema, toLowerCase);
            });
        }
        currentSchema = currentSchema.getSubSchema(subSchemaNames.iterator().next());
    }

    return Calcite.withDataSets(builder.build());
}
static String smlProgram = "from a in chinook__public.artist yield {a.artistid, a.name};";

static Reader input = new InputStreamReader(new ByteArrayInputStream(smlProgram.getBytes()));
static Writer output = new StringBuilderWriter();

public static void main(String[] args) {
    Calcite calciteMorelCtx = createMorelCalciteForeignValueCtx(ROOT_SCHEMA);
    var morelMain = new Main(List.of(), input, output, calciteMorelCtx.foreignValues(), new File(""));
    morelMain.run();
}
Result:
val it =
  [{artistid=1,name="AC/DC"},{artistid=2,name="Accept"},
   {artistid=3,name="Aerosmith"},{artistid=4,name="Alanis Morissette"},
   {artistid=5,name="Alice In Chains"},{artistid=6,name="Ant?nio Carlos Jobim"},
   {artistid=7,name="Apocalyptica"},{artistid=8,name="Audioslave"},
   {artistid=9,name="BackBeat"},{artistid=10,name="Billy Cobham"},
   {artistid=11,name="Black Label Society"},{artistid=12,name="Black Sabbath"},
   ...] : {artistid:int, name:string} list
julianhyde commented 2 years ago

Is there a way to get this enhanced functionality into trunk? By which I mean, reframe it as a new feature, then submit a PR that implements the feature, including tests?

I certainly think that chinook would be a good data set to have built into Morel, alongside scott and foodmart.

Regarding the name mapping, to chinook__public. It would be preferable to address it via either chinook or chinook.public. The former would require the mapping to have something like a 'default schema' property; the latter would require CalciteForeignValue to surface sub-schemas as fields of the Morel record value, in addition to tables.

GavinRay97 commented 2 years ago

Is there a way to get this enhanced functionality into trunk? By which I mean, reframe it as a new feature, then submit a PR that implements the feature, including tests?

Sure, certainly not opposed to doing that.

the latter would require CalciteForeignValue to surface sub-schemas as fields of the Morel record value, in addition to tables.

I think I might try this route, extending the code that already exists

May see if there's a way to write a Shell that is more friendly for non-REPL uses, that allows you to instantiate a persistent foreign value context, where you can repeatedly call .run(scriptString, outBuffer)

julianhyde commented 2 years ago

May see if there's a way to write a Shell that is more friendly for non-REPL uses

Designing for reuse is hard. So I'd probably copy-paste the existing shell, customize it until I had something that works for non-REPL uses, then see whether there's any way to get rid of the copied code (by sub-classing, adding handlers, etc.) Or just start from scratch. If the code seems to be trending towards a shell, let it go in that direction. :)

Maybe take inspiration from https://github.com/SpencerPark/IJava (a Java kernel for Jupyter).

GavinRay97 commented 2 years ago

@julianhyde I have created an initial PR here, but it's throwing an error about RelMetadataQueryBase.getMetadataHandlerProvider being null:

https://github.com/hydromatic/morel/pull/146

java.lang.NullPointerException: metadataHandlerProvider

    at java.base/java.util.Objects.requireNonNull(Objects.java:233)
    at org.apache.calcite.rel.metadata.RelMetadataQueryBase.getMetadataHandlerProvider(RelMetadataQueryBase.java:122)
    at org.apache.calcite.rel.metadata.RelMetadataQueryBase.revise(RelMetadataQueryBase.java:118)
    at org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:604)
    at org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:291)
    at org.apache.calcite.rel.logical.LogicalProject.lambda$create$0(LogicalProject.java:125)
    at org.apache.calcite.plan.RelTraitSet.replaceIfs(RelTraitSet.java:244)
    at org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:124)
    at org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:114)
    at org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:178)
    at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:2025)
    at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1797)
    at net.hydromatic.morel.foreign.CalciteForeignValue.lambda$value$4(CalciteForeignValue.java:100)
    at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422)
    at com.google.common.collect.RegularImmutableSortedSet.forEach(RegularImmutableSortedSet.java:89)
    at net.hydromatic.morel.foreign.CalciteForeignValue.value(CalciteForeignValue.java:89)
    at net.hydromatic.morel.compile.Environments.lambda$foreignBindings$2(Environments.java:104)
    at com.google.common.collect.RegularImmutableMap.forEach(RegularImmutableMap.java:292)
    at net.hydromatic.morel.compile.Environments.foreignBindings(Environments.java:101)
    at net.hydromatic.morel.compile.Environments.env(Environments.java:95)
    at net.hydromatic.morel.compile.Environments.env(Environments.java:63)
    at net.hydromatic.morel.ProgrammaticShell.makeEnv(ProgrammaticShell.java:80)
    at net.hydromatic.morel.ProgrammaticShell.<init>(ProgrammaticShell.java:54)
    at net.hydromatic.morel.ProgrammaticShellTest.run(ProgrammaticShellTest.java:19)

Would appreciate any feedback if you see what I'm doing wrong

GavinRay97 commented 2 years ago

Ah wow wtf -- removing static from foreignValueMap makes it work:

// BROKEN
static Map<String, ForeignValue> foreignValueMap = Calcite.withDataSets(BuiltInDataSet.DICTIONARY).foreignValues();

// WORKS
Map<String, ForeignValue> foreignValueMap = Calcite.withDataSets(BuiltInDataSet.DICTIONARY).foreignValues();

I guess I don't understand Java? This is very confusing 🤔

julianhyde commented 1 year ago

Yes, foreignValue is the way to do it. It associates a Calcite schema with a Morel variable. You could have a single variable named ‘calcite’ for the root schema (and all first-level schemas would appear as fields) or create a separate foreignValue for each first-level schema.

You will need some kind of shell. (Parses and executes commands, and prints the results, in the environment created by previous commands in the same session.) Maybe one of the existing shells in Morel will work for you, or could be adapted.

The existing shells are able to read commands from a file, including recursively reading files, then switch back to their original input stream, which might be useful to you.

I briefly looked at creating a Java wrapper so that Morel could run in Jupyter, and it looked like a pretty similar problem.

Julian

On Apr 30, 2022, at 10:54 AM, Gavin Ray @.***> wrote:

 Heya Julian,

I'm working on a web playground for Calcite, that I plan on hosting for free so that people can experiment with it. It'll essentially be a web REPL with some pre-loaded data, and the ability to render visualizations of query plans + schema diagrams.

I figured that while I'm at it, it could be nice to add a second input form, which takes Morel expressions, and is bound to the same Calcite context/rootSchema.

Having a bit of difficulty figuring out how to do this though.

I have a singleton object that holds the master schema like below -- I think I need to pass this as a foreignValue to Morel or similar?

// Holds the master schema that contains all datasource schemas and their sub-schemas object CalciteSchemaService {

val connection: CalciteConnection = initCalciteConnection()
val rootSchema: SchemaPlus = connection.rootSchema

val frameworkConfig: FrameworkConfig = Frameworks.newConfigBuilder()
    .defaultSchema(connection.rootSchema)
    .parserConfig(SqlParser.config().withCaseSensitive(false))
    .build()

val relBuilder: RelBuilder = RelBuilder.create(frameworkConfig)

// <SNIPPED>

} — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.