wavded / ogr2ogr

An ogr2ogr wrapper library
MIT License
216 stars 46 forks source link

Load CSV to Postgres very slow #87

Closed xqin1 closed 2 years ago

xqin1 commented 2 years ago

When loading CSV file to Postgres, it seems that the stream of the file limits the number of records being inserted into the PG table each time.

the code:

         let {cmd, text, details} =   await ogr2ogr(filePath,
                {
                    format: 'PostgreSQL',
                    options: ['-nln', 'test', '-gt', 2000000],
                    destination: connection
                });

On Postgresql side, the following statement is executed multiple times and each time to insert small number of records:

COPY "test" ("wkb_geometry", "filer_id", "filer_desc") FROM STDIN;

This is for V3 only, I don't have the issue with V2. Is there any settings for Postgresql only?

xqin1 commented 2 years ago

I think the issue is with option -skipfailures being set by default. With this option, the -gt option is no longer valie. Here's from OGR2OGR documentment:

When writing into transactional DBMS (SQLite/PostgreSQL,MySQL, etc…), it might be beneficial to increase the number of INSERT statements executed between BEGIN TRANSACTION and COMMIT TRANSACTION statements. This number is specified with the -gt option. For example, for SQLite, explicitly defining -gt 65536 ensures optimal performance while populating some table containing many hundreds of thousands or millions of rows. However, note that -skipfailures overrides -gt and sets the size of transactions to 1.

Is it possible to remove -skipfailures as default option?

wavded commented 2 years ago

-skipfailures is very commonly used, in most cases so it is enabled by default due to historically that being the options most users want to get the results they want. That said, we don't have a way to turn it off and should for these cases. I wonder if we add a disableSkipFailures option.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stalled for 5 days with no activity.

xqin1 commented 2 years ago

Sorry being late on this. An disableSkipFailures option would solve the issue, any chance to implement this feature?