Schema File Scope Issue: Generating Zod Schemas for All Tables Instead of Specified Schema

jascenc1 commented 2 months ago

When using the -f option to specify a schema file in surql-gen, the tool sometimes generates Zod schemas for all tables in the SurrealDB instance rather than limiting its scope to the tables defined in the specified schema file. This behavior is not consistent with the expected functionality, which should only focus on the schema provided in the specified file.

Description

Even when a specific schema file is provided with the -f option, the tool retrieves information for all tables in the SurrealDB instance using getAllTableInfo() and generates Zod schemas for all of them instead of only the subset of tables intended for processing (if applicable).

Expected Behavior

The tool should limit its schema generation to only the tables defined in the provided schema file when the -f option is used.

Potential Fix

Track Tables from the Schema File:

Parse the schema file to identify which tables are defined or updated.

const getTablesFromSchemaFile = (schemaContent: string): string[] => {
const tableRegex = /DEFINE\s+TABLE\s+(\w+)/g;
const tables = [];
let match;
while ((match = tableRegex.exec(schemaContent)) !== null) {
tables.push(match[1]);
}
return tables;
};

Filter getAllTableInfo():
- Modify the logic to filter the results of getAllTableInfo() to include only the tables that were defined or updated by the schema file.
- Use result from above to filter results of getAllTableInfo()

sebastianwessel commented 2 months ago

This is kind of expected behavior. The schema file will run against the database. In the next step, getAllTableInfo is called to get the information.

It should maybe better mentioned somewhere in the docs, that it is recommended to use a temporary in memory db or a db/ns especially for this.

jascenc1 commented 2 months ago

Thank you for the explanation. I understand that the current behavior is by design. However, do you think it could be more efficient if the tool generated Zod schemas only for the tables specified in the schema file (if applicable, of course), especially when working with large databases.

Processing all tables can be time-consuming and unnecessary when users only want to update schemas for specific tables. It would be great if the tool could focus on the relevant tables defined in the schema file.

If this isn’t feasible, a note in the documentation about this behavior might be helpful. I appreciate the work you’ve done on this project and hope this feedback is useful!

Thanks again!

hammo92 commented 2 months ago

A workaround for just generating the schemas for the provided schema file would be to provide the schema file and an empty surrealDB instance; as the way the tool works with a schema file is to insert the definitions from the file and then fetch the table definitions from the instance, it doesn't parse the schema file directly.

@sebastianwessel It may be possible to add a configuration option to use only the schema file. We could create an in memory instance of surrealDB instead of expecting one to already be available, I do something similar for running tests


export async function startSurrealServer(): Promise<void> {
    await killExistingSurrealProcesses();

    return new Promise((resolve, reject) => {
        surrealProcess = spawn('surreal', ['start', '--log', 'none', '--allow-all', '--no-banner', '--user', 'root', '--pass', 'root', '--bind', '127.0.0.1:7777', 'memory']);

        surrealProcess.stderr!.on('data', (data) => {
            console.error(`SurrealDB Error: ${data}`);
            if (data.includes('Address already in use')) {
                reject(new Error('SurrealDB port is already in use. Try again in a few moments.'));
            }
        });

        surrealProcess.on('error', (error) => {
            console.error(`Failed to start SurrealDB: ${error}`);
            reject(error);
        });

        const checkServer = setInterval(async () => {
            try {
                await fetch('http://127.0.0.1:7777/health');
                clearInterval(checkServer);
                resolve();
            } catch (error) {
            }
        }, 100);

        setTimeout(() => {
            clearInterval(checkServer);
            reject(new Error('SurrealDB server failed to start in time'));
        }, 30000)

    });
}```

sebastianwessel commented 2 months ago

Yes, that was the original way with the "old" nodejs lib, which had surrealdb as webassembly. This was the big pro - you didn't needed an extra docker or native instance.

With the switch to the current lib, this was no longer possible.

That is the root cause of this issue.

sebastianwessel commented 2 months ago

Maybe we should simply use the testcontainers lib, to start a db instance in memory, which cleans up automatically after shutdown?

hammo92 commented 2 months ago

It's not a library I've used before but from what I can see it looks like it should be possible. Can give it a try.

hammo92 commented 2 months ago

Got it working, was simple enough.

Need to make the decision on what configuration options we should have, should we just have the schema file option and clarify in the docs that this will be schema file only, or add another option that determines whether to use the generated instance? @sebastianwessel

sebastianwessel commented 2 months ago

Think clarification in docs is enough I guess.

jascenc1 commented 2 months ago

Thanks for these changes! I appreciate how quickly you made this happen, and the documentation updates are great.

Looking forward to using this - thanks again!

cc: @sebastianwessel @hammo92

sebastianwessel / surrealdb-client-generator