isapir / Migrate2Postgres

Easily migrate from other DBMSs to PostgreSQL
GNU General Public License v3.0
52 stars 23 forks source link

OutOfMemory when copying a large table #5

Open mishael-w opened 6 years ago

mishael-w commented 6 years ago

hey,

I'm trying to use your tool for migrating my project from MSSQL to Postgres, I've been able to migrate all of the tables and the data I had but one, that has about 20 millions records in it.

while running an attempt to migrate that specific table, I get the error-

java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.FutureTask.report(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at net.twentyonesolutions.m2pg.PgMigrator.lambda$doDml$1(PgMigrator.java:155)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(Unknown Source)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source)
at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.util.stream.ForEachOps$ForEachTask.compute(Unknown Source)
at java.util.concurrent.CountedCompleter.exec(Unknown Source)
at java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.util.concurrent.ForkJoinTask.doInvoke(Unknown Source)
at java.util.concurrent.ForkJoinTask.invoke(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(Unknown Source)
at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.util.stream.ReferencePipeline.forEach(Unknown Source)
at java.util.stream.ReferencePipeline$Head.forEach(Unknown Source)
at net.twentyonesolutions.m2pg.PgMigrator.doDml(PgMigrator.java:153)
at net.twentyonesolutions.m2pg.PgMigrator.main(PgMigrator.java:83)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at net.twentyonesolutions.m2pg.Schema.copyTable(Schema.java:254)
at net.twentyonesolutions.m2pg.PgMigrator.lambda$doDml$0(PgMigrator.java:138)
at net.twentyonesolutions.m2pg.PgMigrator$$Lambda$17/6581689.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

is there any solution?

isapir commented 6 years ago

Strange. That's not supposed to happen even for very large tables.

Did you see any "Insert Failed" error messages before it crashed?

mishael-w commented 6 years ago

I appreciate your tool and your response very much. Thanks a lot.

Yes I have seen a lot of those "Insert failed" messages.

Though when I've looked in The log file no error was shown...

בתאריך יום ו׳, 13 ביולי 2018, 02:36, מאת Igal Sapir ‏< notifications@github.com>:

Strange. That's not supposed to happen even for very large tables.

Did you see any "Insert Failed" error messages before it crashed?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/isapir/Migrate2Postgres/issues/5#issuecomment-404682590, or mute the thread https://github.com/notifications/unsubscribe-auth/Ah17sN80OW8apR48EPyZNoF8PmLSk03oks5uF91xgaJpZM4VOFOi .

mishael-w commented 6 years ago

Another thing I should have mention is that it happens somewhere after 40% (as your program indicates in the output) of the table was done.

isapir commented 6 years ago

OK, so the way that it works is that the log file for each table is built in memory because it is running in multiple threads. When the table completes copying it is written to the log file.

The process can handle the large tables, but your process ran out of memory because the log entries filled it up. The reason that you don't see the errors in the log file is that the process crashed before it got to that point.

I will think of a better way of handling it in the future, but for now you should try to figure out why you get the "Insert Failed" errors and fix that. You should be able to copy your 20M record table with no issue.

After you see the first Insert Failed message, try to hit CTRL+C so that you can read the message.

isapir commented 6 years ago

Another thing I should have mention is that it happens somewhere after 40% (as your program indicates in the output) of the table was done.

8 million records should not be an issue, but millions (or even less) of errors buffered in the memory for future logging would easily run you out of memory.

Find the source of the "Insert Failed" errors.

mishael-w commented 6 years ago

Got the problem, apparently there wasn't enough free disc space in the Postgres server. thank you very much.

‫בתאריך יום ו׳, 13 ביולי 2018 ב-2:54 מאת ‪Igal Sapir‬‏ <‪ notifications@github.com‬‏>:‬

Another thing I should have mention is that it happens somewhere after 40% (as your program indicates in the output) of the table was done.

8 million records should not be an issue, but millions (or even less) of errors buffered in the memory for future logging would easily run you out of memory.

Find the source of the "Insert Failed" errors.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/isapir/Migrate2Postgres/issues/5#issuecomment-404685273, or mute the thread https://github.com/notifications/unsubscribe-auth/Ah17sO7CbmvNA4dmZTSVzZh_Q90hZc1Nks5uF-GjgaJpZM4VOFOi .

isapir commented 6 years ago

OK, thanks for the feedback.

I have no idea how many people found this project useful and if it's worth putting more time and effort into it, so it's good to hear from someone that it works.