webyog / sqlyog-community

Webyog provides monitoring and management tools for open source relational databases. We develop easy-to-use MySQL client tools for performance tuning and database management. Webyog's solutions include SQL Diagnostic Manager for MySQL performance optimization and SQLyog for MySQL administration. More than 35,000 companies (including Amazon, IBM, Salesforce, AT&T, eBay, and GE) and 2.5 million users rely on Webyog's solutions to provide valuable insights into their databases. Webyog is an Idera, Inc. company.
https://webyog.com/
GNU General Public License v2.0
2.18k stars 322 forks source link

Import external data and very larege tabels may exhaust memory. #1709

Open atulwy opened 9 years ago

atulwy commented 9 years ago

Original issue 1709 created by webyog on 2011-09-02T12:54:43.000Z:

Import external data and very large tables may exhaust memory or could be very slow due to swapping.

ODBC/ANSI SQL does not have a CHUNK control (similar to LIMIT in MySQL, TOP(n) in SQL server etc.). "SELECT {columns_list} FROM table (with no CHUNK control)" will require the 'building-up' result set to be stored somewhere. 'Somewhere' may be in memory in either server or client machine or it may be swapped. The swapping may happen to the swap-space managed transparently by the (server or client) OS or the server may have implemented its own swapping mechanism (what likely all significant database servers do - MySQL does for instance).

So what exactly happens in each case depends on the OS's involved, the functioning of the source database and the ODBC-driver as regards large result sets.

We have one report from Oracle (IB # 30474) where the import hangs or at least proceeds so slowly that it is practically (from user's point of view) being the same as a hang.

The problem is also reproduced when importing a large table (in case around 20 GB on a 4 GB machine) from MySQL. The import fails with the error "MySQL client ran out of memory" (http://dev.mysql.com/doc/refman/5.0/en/out-of-memory.html). So in this case obviously the 'building-up' result set is stored in memory client-side (and swaps to virtual memory). Default settings for virtual memory in Windows is 1.5*physical memory what in actual case sums up to 10 GB - too little for a 20 GB result set.

Adding a CHUNK control to the tool would basically require the steps 1) query "what kind of database is this?" 2) send SELECT querie(s) with what CHUNK control is supported by the actual source (and tables with no PK or autoincrement columns would pose another problem that maybe cannot be solved with all sources).

There is no decision on this. We cannot/will not handle all obscure ODBC-soruces, but at least Oracle and SQL server could be investigated and considered. We could also decide that we will not prioritize this (cost/benefit) as with recent systems the table will have to be very large (and situation is improving as hardware evolves) and user may raise his virtual memory setting. User will also have other options (using CSV as an intermediate file format that is next imported with LOAD DATA is obvious).

atulwy commented 9 years ago

Comment #1 originally posted by webyog on 2011-10-13T20:57:04.000Z:

.. and since SQLyog is a 32 bit application and 2G/3G memory limit will/may(?) apply in cases where the result set is 'piled up' client side. It depends on whether the memory used is owned by the ODBC driver or SQLyog/SJA.

atulwy commented 9 years ago

Comment #2 originally posted by webyog on 2013-09-18T11:14:28.000Z:

atulwy commented 9 years ago

Comment #3 originally posted by webyog on 2014-02-27T11:31:40.000Z:

atulwy commented 9 years ago

Comment #4 originally posted by webyog on 2014-02-27T11:37:15.000Z:

Another workaround for user is to use a WHERE clause in the import job - effectively 'splitting up' the job into more jobs executed sequentially (from a batch file for instance).

peterwy commented 8 years ago

Ticket 145413043/138899/ is similar but not the same.

Here user requests separate transactions for each BULK INSERT as MySQL rolls back the transaction when the pending transaction size has reached what the server/InnoDB can handle.