launchbadge / sqlx

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, and SQLite.
Apache License 2.0
13.27k stars 1.26k forks source link

migration checksums different on unix/non-unix systems due to CRLF/LF line endings #2659

Open DawidPietrykowski opened 1 year ago

DawidPietrykowski commented 1 year ago

Bug Description

When a migration checksum is generated, it seems like it's putting the whole file as bytes through SHA384 generator as stated by this code:

impl Migration {
    pub fn new(
        version: i64,
        description: Cow<'static, str>,
        migration_type: MigrationType,
        sql: Cow<'static, str>,
    ) -> Self {
        let checksum = Cow::Owned(Vec::from(Sha384::digest(sql.as_bytes()).as_slice()));
        Migration {
            version,
            description,
            migration_type,
            sql,
            checksum,
        }
    }
}

What that means is when a migration is checked with version control (git in my case) it might manipulate the CRLF/LF line endings based on system. You could also run into this problem when creating/modifying the file on 2 different system even without version control I believe.

This was causing version mismatch error in my program, as I run migrate! in my main function to ensure that the db is set up correctly.

I work on 2 different systems and as of now it seems like I have to comment out my migrate! line if I'm sure it was run, don't know how to handle it in production though.

I'm not sure how to best handle this issue, but maybe conversion CRLF->LF before passing the text to sha would make sense?

Minimal Reproduction

  1. create a repository with a migration file on windows
  2. run migration
  3. check out with git while having core.autocrlf=true
  4. clone repository on mac/linux
  5. run migration

Info

YgorSouza commented 1 year ago

Maybe if it detects a mismatch, it could change the line endings and try again, so the change wouldn't affect anything that is currently working properly.

For example the following method could be added in this impl (I haven't tested this):

    pub fn matches(&self, checksum: &Cow<'static, [u8]>) -> bool {
        if self.checksum == *checksum {
            return true;
        }
        let alternate_line_ending = if self.sql.contains("\r\n") {
            "\n"
        } else {
            "\r\n"
        };
        let lines: Vec<_> = self.sql.lines().collect();
        let alternate_checksum = Cow::<'static, [u8]>::Owned(Vec::from(
            Sha384::digest(lines.join(alternate_line_ending).as_bytes()).as_slice(),
        ));
        alternate_checksum == *checksum
    }
rustdesk commented 1 year ago

I have the same issue.

Maybe if it detects a mismatch, it could change the line endings and try again, so the change wouldn't affect anything that is currently working properly.

Exactly the best practice.

https://github.com/rustdesk-org/sqlx/commit/50392d5a29edfeb2ba6c03b980547447736f8bee works for me.

Andrepuel commented 10 months ago

Perhaps we could have some quote-aware whitespace removal before hashing the SQL?

Like:

create
         table `two  spaces` (id int);

Becoming:

create table `two  spaces` (id int);

No \r or \n at all and replacing any sequence of continuous any-space character with one single space.

gilgabo commented 8 months ago

Hi, any update for this bug? Do you know if it will be fixed in the next version?

Nahuel-M commented 7 months ago

I believe replacing all CRLF with LF before the hashing will solve most issues. We're not editing the original SQL statement, just a clone that will be hashed.

Single issue that I can still see is that CRLF and LF cannot be distinguished in a hardcoded string, but this seems like an extreme edge case.

rustdesk commented 7 months ago

It is not extreme edge case for sqlite. Because we may copy it here and there. We may run it on Linux server, but back up on a Windows server.

For the time being, I have to use my fork, https://github.com/rustdesk-org/sqlx/commit/50392d5a29edfeb2ba6c03b980547447736f8bee

Nahuel-M commented 7 months ago

Perhaps I'm misunderstanding, but could you give me an example of a bug that might arise from replacing CRLF by LF before hashing? The functionality of the queries will not change depending on the newlines in the hash.

abonander commented 7 months ago

Because it will change the checksums for anyone deploying from/to Windows and break all their migrations.

I'm not against adding this, but it needs to be opt-in.

This is also really straightforward to fix yourself with a .gitattributes file.