pressly / goose

A database migration tool. Supports SQL migrations and Go functions.
http://pressly.github.io/goose/
Other
6.48k stars 503 forks source link

Semicolon Detection Fails for "--" in Strings in SQL Statements #699

Open Ynng opened 6 months ago

Ynng commented 6 months ago

When executing SQL migrations, the parser misses the semicolon and incorrectly merges statements if a string contains "--" after a space, mistakenly interpreting it as the start of a comment. This issue occurs even though the "--" is part of a string and should not be treated as a comment. For example:

-- +goose Up
CREATE TABLE t1 (
    c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    c2 VARCHAR(100))
ENGINE=InnoDB
COMMENT='Look at this cool arrow -->';

CREATE TABLE t2 (
    c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    c2 VARCHAR(100))
ENGINE=InnoDB;

-- +goose Down
DROP TABLE t1;

This results in syntax errors when running migrations, as seen below:

partial migration error (type:sql,version:1): Error 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CREATE TABLE t2 (
    c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    c2 VARCHAR' at line 7 

The issue appears to be in the semicolon detection logic endsWithSemicolon(line string) bool: https://github.com/pressly/goose/blob/d4a4dc3aa17829adf9d66d53d7a534ea59ead6be/internal/sqlparser/parser.go#L302-L321

mfridman commented 6 months ago

Ye this is a bit unfortunate, I suppose you could move the semicolon to a new line, or wrap that statement in a +goose StatementBegin / +goose StatementEnd.

The SQL parser is quite basic and does the bare minimum. Open to suggestions on how this could be improved.

Ynng commented 6 months ago

Certainly, I can manually adjust the SQL, but for my scenario, which involves programmatically generating migrations from mysqldump, the task becomes more challenging.

For now, my temporary workaround is to regex my arrows from --> to ->, but this is obviously not a universal fix for --.

I don't really see any solutions that doesn't require complicating ParseSQLMigration. Maybe we can track whether or not we are inside a string by looking for the ' character? But there are many edge cases...

mfridman commented 6 months ago

Certainly, I can manually adjust the SQL, but for my scenario, which involves programmatically generating migrations from mysqldump, the task becomes more challenging.

Yep, that's an excellent example.

I don't really see any solutions that doesn't require complicating ParseSQLMigration.

Pretty much. Which gets us into the territory of writing a full-blown SQL parser, otherwise we're always fighting a new edge case. To make matters worse, there's always some subtle dialect-specific difference.

I'll keep this issue open and continue to think this through in the background.

I wonder if you could wrap your entire dumped schema with all statements within:

-- +goose Up
-- +goose StatementBegin

... your entire schema here

-- +goose StatementEnd

This tells goose to send the entire set of querie(s) as a single semicolon-separated query. And usually this just works unless you have an extensive schema, exceed the database limit or a specific query can't be run in the same transaction.

A bit more background on these annotations can be found here:

https://pressly.github.io/goose/blog/2022/overview-sql-file/#multiple-statements