sfackler / rust-postgres

Native PostgreSQL driver for the Rust programming language
Apache License 2.0
3.49k stars 443 forks source link

Unnesting array data #977

Closed thomasmathews01 closed 1 year ago

thomasmathews01 commented 1 year ago

I am trying to bulk insert large amounts of data, which consists of small arrays, by using the UNNEST approach to cut down on the RTTs required.

Where my postgres table looks like so:

Source ID Timestamp Value
1 2023-01-04T11:15:37+00:00 {1, 2, 3, 4}
2 2023-01-04T11:15:37+00:00 {1, 2, 3, 4}
3 2023-01-04T11:15:37+00:00 {1, 2, 3, 4}
4 2023-01-04T11:15:37+00:00 {1, 2, 3, 4}

Using the below postgres function (because postgres otherwise unnests real[][] to real by default, and I need real[])

CREATE OR REPLACE FUNCTION unnest_float_arrays(a real[][], OUT a_1d real[][])
  RETURNS SETOF real[]
  LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
BEGIN
   FOREACH a_1d SLICE 1 IN ARRAY a LOOP
      RETURN NEXT;
   END LOOP;
END
$func$;

The below is valid postgresql, and can be executed directly with psql, with the array of values, which are themselves arrays, correctly unnested

INSERT INTO float_arrays (source_id, timestamp, value)
    SELECT unnest(ARRAY[1, 2, 3, 4]), unnest(ARRAY[current_timestamp, current_timestamp, current_timestamp, current_timestamp]), unnest_float_arrays(ARRAY[ARRAY[1, 2, 3, 4], ARRAY[1, 2, 3, 4], ARRAY[1, 2, 3, 4], ARRAY[1, 2, 3, 4]])

But when I try and execute this using the library dynamically, as so:

let sources = vec![1, 2, 3, 4];
let timestamps= vec![];
let values = vec![vec![1,2,3,4],vec![1,2,3,4],vec![1,2,3,4],vec![1,2,3,4]];

let insert_statement = "INSERT INTO {} (source_id, timestamp, value) SELECT unnest($1::int4[]), unnest($2::timestamptz[]), unnest_float_arrays($3::real[][])";
let mut statement = tx.prepare_cached(&insert_statement).await?;
let count = tx.execute(&statement, &[&sources, &timestamps, &values]).await?;

It fails to correctly understand the type and I get the error:

Error: error serializing parameter 2: cannot convert between the Rust type `alloc::vec::Vec<alloc::vec::Vec<f32>>` and the Postgres type `_float4`

Caused by:
    cannot convert between the Rust type `alloc::vec::Vec<alloc::vec::Vec<f32>>` and the Postgres type `_float4`

This was using version 0.7, and the features "with-serde_json-1" & "with-chrono-0_4".

sfackler commented 1 year ago

The dimensionality of a Postgres array is only a property of individual values, not their type. As the docs note, the ToSql implementation for Vec<T> only interoperates with single-dimensional Postgres arrays. You can use the postgres-array crate to work with multi-dimensional arrays.

However, a COPY query is a more standard approach for bulk insert workflows.