salsita / node-pg-migrate

Node.js database migration management for PostgreSQL
https://salsita.github.io/node-pg-migrate
MIT License
1.29k stars 177 forks source link

We need to be able to use parameterised insert queries to populate a table during migration #1292

Open OracPrime opened 3 days ago

OracPrime commented 3 days ago

Description

It is currently quite messy to use parameterised inserts. If you naively use pgm.sql with $1 style parameters you get an error message related to regular expressions which is somewhat impenetrable if your not expecting mustache templating. If you use mustache templating, you are open to injection attacks. The only solution I can find is using pg-format for build the string (and I'm a bit worried if the result of that itself contains, for example $1.

Suggested solution

It would be nice to have access to an equivalent to pgm.sql() which doesn't do the mustaching. psm.sql_raw() for example. Either a new function, or an additional options argument on the existing one. However I prefer the separate function approach as it keeps the meaning simple and the option argument might be confused with an argument for the sql.

Alternative

pg-format is the only one I'm aware of.

Additional context

At minimum the regex error should be decorated with some explanation?

Shinigami92 commented 3 days ago

Please note that we don't really need to be careful/sanitize parameters, because you should not use node-pg-migrate at runtime, but only for migration progresses.

That aside, what would you propose as an API?

And how does a current use-case look like right now (rough example).

OracPrime commented 3 days ago

The current use case if for populating a static lookup table. I'd agree in theory you're only using your own data, so you should feel slightly safe, but in an open source environment an innocuous update of a data file could become an injection attack, so I want to be paranoid.

I think my proposed api is to have a function that looks exactly like sql but which doesn't mustache.

Current use case it this (previously insert with $1... $5)

    async up(pgm) {
        pgm.createTable("uk_stations", {
            station_id: { notNull: true, primaryKey: true, type: "bigserial" },
            station_name: { notNull: true, type: "text" },
            sixteen_character_name: { notNull: true, type: "text" },
            longitude: { notNull: true, type: "float8" },
            latitude: { notNull: true, type: "float8" },
            crs_code: { notNull: true, type: "char(3)" },
        });

        const filePath = path.join(__dirname, "../ukStations.csv");
        const fileContent = fs.readFileSync(filePath, "utf8");
        const records = parse(fileContent, {
            columns: true,
            skip_empty_lines: true,
        });

        // Prepare data for bulk insertion
        const values = records.map(
            ({
                station_name,
                sixteen_character_name,
                latitude,
                longitude,
                crs_code,
            }) => [
                station_name,
                sixteen_character_name,
                latitude,
                longitude,
                crs_code,
            ],
        );

        const query = format(
            `INSERT INTO uk_stations (station_name, sixteen_character_name, latitude, longitude, crs_code) VALUES %L`,
            values,
        );
        pgm.sql(query);
    },
Shinigami92 commented 2 days ago

Just me thinking out loud: Not sure if we should provide a insert (and then update and delete) statement, because these are not really anymore migrations but populations 🤔 Currently I would suggest to use node-pg-migration as a migration tool to have your DB versioned and use other tools (potentially knex) for population after the migration was executed.

However, I even know from myself in company work that there is best practice and real world scenario 🤣 ...

So maybe we could add just the absolutely simplest form of such functions that in the end will more or less just to what you already do yourself with the format function.

pgm.insert('uk_station', {
  station_name: '...',
  // data for one station...
});

pgm.insertBulk('uk_station', [
  {
    station_name: '...',
    // data for one station...
  },
  // ...
]);

I have not thought that fully out yet, but I would be okay with it if you explore an initial draft PR on your own with that info and then I can review and see where it is going.
Sounds good?