sindicate / solidbase

SolidBase is a database change management and version control tool that uses annotated SQL
https://code.google.com/p/solidbase/
Apache License 2.0
2 stars 0 forks source link

Don't read the complete CSV data in memory #106

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Instead of reading the CSV data into memory and then parsing it, can we do it 
directly from the file? This would allow for massive imports.

Problem is the delimiter, the CSV Parser does not recognize it.

This is a big change. Also the commandprocessor needs changing because that is 
the one that reads the complete command.

Maybe we should do it like this:

IMPORT CSV INTO ATABLE;
...CSV DATA...
<empty line>

instead of

IMPORT CSV INTO ATABLE DATA
...CSV DATA...;

This way we can keep both.

Original issue reported on code.google.com by rene.de....@gmail.com on 16 Oct 2010 at 9:42

GoogleCodeExporter commented 9 years ago
It seems that we can't build up the batch too large. executeBatch() should be 
called regularly. If we let the batch grow too large we get the following 
strange error:

null: Ongeldige batchwaarde.

executeBatch() every 10,000 or every 1000? Test performance of 1000 or 10,000.

Original comment by rene.de....@gmail.com on 16 Oct 2010 at 10:51

GoogleCodeExporter commented 9 years ago
Streaming CSV implemented. An empty line signals the end. That might be a 
problem when importing a single column where empty values are allowed. What's 
the solution for that? Introduce a marker keyword?

IMPORT CSV INTO ATABLE UNTIL THEEND;
...CSV DATA...
THEEND

But that can be added later too.

Currently executeBatch() after 1000 records. Seems to have no negative impact 
on performance. Needs to be tested with a database connected through the 
network so that network latency will play its role.

Original comment by rene.de....@gmail.com on 21 Oct 2010 at 8:59

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Actually, importing a single column with empty values is already a problem for 
the other form of import too, created issue #110 for that.

Original comment by rene.de....@gmail.com on 21 Oct 2010 at 9:14

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Also decided to go with RFC4180 and not strip whitespace from the values. Can 
we make it optional?

IMPORT CSV IGNORE WHITESPACE
this will ignore whitespace except the whitespace between double quotes.

IMPORT CSV STRIP WHITESPACE
this will strip all whitespace even the whitespace between double quotes.

The IGNORE needs to be implemented first. Currently no need to implement STRIP.

Original comment by rene.de....@gmail.com on 23 Oct 2010 at 11:27

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r534.

Original comment by rene.de....@gmail.com on 23 Oct 2010 at 8:45