postgres bulk insertion seems to be slow

rajkamal0610 commented 1 year ago

Hi @JonathanMagnan,

we are facing slowness while inserting the data to Postgres [ basically it's a Timeseries data, also continuous insertion happens to the table based on scheduling period]. I placed here the sample code and table structure. Need your help

CREATE TABLE public.td_00001Test
(
    "UnqSerialNo" uuid NOT NULL DEFAULT uuid_generate_v4(),
    "Time" timestamp with time zone NOT NULL,
    "Quality" character varying(255) COLLATE pg_catalog."default",
    "Value" text COLLATE pg_catalog."default",
    "MDTag_OID" integer
)

TABLESPACE pg_default;

ALTER TABLE public.td_00001Test
    OWNER to postgres;

using Microsoft.Extensions.Logging;
using System.Data;
using System.Reflection.Emit;
using System.Text;
using Z.BulkOperations;

Console.WriteLine("Hello, World!");

DataTable dt = new DataTable();
dt.TableName = "td_00001test";

dt.Columns.Add("Time", typeof(DateTime));
dt.Columns.Add("Quality", typeof(string));
dt.Columns.Add("Value", typeof(string));
dt.Columns.Add("MDTag_OID", typeof(int));

Random rnd = new Random();

for (int l = 0; l < 6; l++)
{
    DateTime dtime = DateTime.Now.AddMinutes(-l + 1);
    for (int i = 0; i < 1000; i++)
    {
        DataRow dr = dt.NewRow();
        dr["Time"] = dtime;
        dr["Quality"] = "GOOD";
        dr["Value"] = rnd.Next(6000);
        dr["MDTag_OID"] = i;

        dt.Rows.Add(dr);
    }
}

dt.DefaultView.Sort = "Time ASC";
dt = dt.DefaultView.ToTable();

Console.WriteLine(DateTime.Now.ToString("dd/MM/yyyy hh:mm:ss.fff"));
//StringBuilder logger = new StringBuilder();

using (Npgsql.NpgsqlConnection conn = new Npgsql.NpgsqlConnection())
{
    conn.ConnectionString = "Server=127.0.0.1;Port=5432;Database=Testing;Userid=postgres;Password=dDriven@2018;Pooling=false;";

    conn.Open();

    using (var bulk = new BulkOperation(conn))
    {
        // easy to use
        bulk.DestinationTableName = "td_00001test";
        //bulk.Log += s => logger.AppendLine(s);
        bulk.BatchSize = 100;
        //bulk.AutoMapKeyName = "MDTag_OID;Time";
        //bulk.AutoMapOutputDirection = false;
        bulk.UseTableLock = false;
        bulk.UseLogDump = true;

        bulk.BulkInsert(dt);

        var logDump = bulk.LogDump;

        // SHOW data
        Console.WriteLine(logDump.ToString());
    }

    conn.Close();
}
Console.WriteLine(DateTime.Now.ToString("dd/MM/yyyy hh:mm:ss.fff"));

//Console.WriteLine(logger.ToString());

Console.ReadKey();

Thread.Sleep(2000);

JonathanMagnan commented 1 year ago

Hello @rajkamal0610 ,

Is there a reason why you use a small batch size (BatchSize = 100;)? The default is 10,000 for PostgreSQL in our library

Best Regards,

Jon

rajkamal0610 commented 1 year ago

Hi @JonathanMagnan ,

No specific reason, we tried with the batch size 100, 1000, 10000, 100000 but the result was same.

In the code we are generating 6000 records with or without batch size, processing time 4sec

rajkamal0610 commented 1 year ago

Any update regarding this @JonathanMagnan

JonathanMagnan commented 1 year ago

Hello @rajkamal0610 ,

My developer did some performance tests, and the issue seems more related to the uuid_generate_v4() function being slow. For example, here is some statistics on 6000 inserted rows:

""UnqSerialNo"" uuid NOT NULL DEFAULT uuid_generate_v4(),: was taking 2.8s
""ID"" integer NOT NULL DEFAULT nextval('""entitysimples_id_seq""') was taking 0.3s

Obviously, one is a GUID and the other the identity, but you can already see there is a huge performance difference that, unfortunately, we don't control in this case.

One way to improve the performance is to not longer let the PostgreSQL database manage the GUID when using bulk operations but instead generate it on the application side and insert the value directly.

Let me know if that explains the cause & possible solution correctly

Best Regards,

Jon

rajkamal0610 commented 1 year ago

Hi @JonathanMagnan,

Thanks for your effort.

Yes, as you said the cause of slowness is uuid. We will make the changes either by generating it on the application side or removing that column

zzzprojects / Bulk-Operations

postgres bulk insertion seems to be slow #74