dumpfile writer is dying on the planet pbf file

bwitham commented 7 years ago

https://github.com/DigitalGlobe/MapEdit/issues/693

Dying with bad alloc against a 30GB pbf and seems to be doing so when the sql file is flushed. Possibly the same changes that caused problems with the Rio building data last summer could still be contributing to this. Will run Terry's original implemenation at v0.2.24 to verify. Need to verify that the file write is actually streamed...maybe its not (we know the reading is).

bwitham commented 7 years ago

@Cellington1

Cellington1 commented 7 years ago

Notes:

Most recent run tested on a machine with 30GB RAM
Testing with 35GB osm planet pbf

Failled withthe following error after running for 4 days:

14:03:45.480 DEBUG ...b/PostgresqlDumpfileWriter.cpp( 234) Wrote contents of section changesets
14:03:45.484 DEBUG ...b/PostgresqlDumpfileWriter.cpp( 208) Flushing section current_nodes to file /tmp/qt_temp.XM1492
Error running convert:
std::bad_alloc

bwitham commented 7 years ago

I believe the culprit is here:

https://github.com/ngageoint/hootenanny/blob/develop/hoot-core/src/main/cpp/hoot/core/io/OsmPostgresqlDumpfileWriter.cpp#L227

I would think that would blow the memory out during the dump file write. I'm going to change it to write smaller chunks.

bwitham commented 7 years ago

Have a potential fix for this. Rather than wait 4+ days on the planet ingest, going to spin up a VM with reduced memory and try to force the error with the rio buildings data, which only takes about 10 minutes to generate the sql file. Should be able to see the OOM error before the changes and not see it after.

bwitham commented 7 years ago

I made a test with the rio buildings ingest where I attempted to create a VM with memory just small enough to trigger the error before these changes and then see if it would succeed after the changes. It did just that, so hopefully this is a valid fix.

When monitoring this ingest, it takes about 3.7GB both before and after the changes. Presumably, before the changes there should be some spike in memory consumption just when the sql file is flushed since the writing of the file was not memory bound. Strangely, I haven't been able to see that spike occur. What's additionally confusing to me is that when I restrict the VM memory to what the base hoot VM requires + the rio ingest amount + a tiny bit of extra, the jobs fail silently (no alloc errors)...but they have the same behavior as an OOM error to me. I had to roughly double the memory from the amount I would expect the job to succeed at on the VM before I arrived at the result where the job before the changes failed on a bad alloc and the one after the changes did not, as described above.

bwitham commented 7 years ago

@Cellington1 I have a potential fix for this in the hoot 1430 branch that makes the output of the sql file no longer memory bound, so run the planet ingest against that branch when you have a chance. I won't merge the changes into develop until we have proven they work. I didn't run the planet ingest myself due to the time involved and wanting to move onto #1431 (used the test described above instead). If this fix doesn't end up doing the job for the planet ingest, I'll revisit it. Thanks.

Cellington1 commented 7 years ago

@bwitham Awesome, I'll get that going.

bwitham commented 7 years ago

Cool. I'm moving onto #1431, which should let us write to a live database safely.

Cellington1 commented 7 years ago

@bwitham Got 1430 installed on an AWS instance. Kicked off the conversion. Looks to be running much faster. I'm thinking we should see the ~15 hour completion time. Awesome job.

bwitham commented 7 years ago

good deal. let me know if anything doesn't work out with that ingest. I'm almost done with the ID safe version, which means it will work the same (SQL file generation part will be a little bit faster than before) except it will be safe from ID conflicts if other people are using the database at the same time you are writing to it....that version will run a bit slower under circumstances where there are other writers, but not too much slower. Next, I'm going to try the multi-threaded writer with pg_bulkload, which I'm hoping will yield ~50% performance improvement over the latest version, but not entirely sure if that performance gain will happen or not.

Cellington1 commented 7 years ago

Awesome. Checked it out this morning. It slowed down a bit, but it is still going. I'll let you know how it goes.

bwitham commented 7 years ago

Are you still on the SQL file generation part or on the SQL exec part now?

In the new version I combined those two steps into one command.

Cellington1 commented 7 years ago

I am still on the sql generation part - This is still a two-part process in 1430, correct?

bwitham commented 7 years ago

Right. You'll still have to exec the SQL file with psql after its done.

Cellington1 commented 7 years ago

Thats what I thought - thanks.

Cellington1 commented 7 years ago

Tests finished. It failed with the same error, but the sql file was larger this time (~250 GB) It should be ~ 1TB.

Error:

17:01:02.041 DEBUG ...smPostgresqlDumpfileWriter.cpp( 208) Flushing section sequence_updates to file /tmp/qt_temp.VE2090
17:01:02.048 DEBUG ...smPostgresqlDumpfileWriter.cpp( 250) Wrote contents of section sequence_updates
17:01:02.048 DEBUG ...smPostgresqlDumpfileWriter.cpp( 208) Flushing section users to file /tmp/qt_temp.qq2090
17:01:02.051 DEBUG ...smPostgresqlDumpfileWriter.cpp( 250) Wrote contents of section users
17:01:02.051 DEBUG ...smPostgresqlDumpfileWriter.cpp( 208) Flushing section changesets to file /tmp/qt_temp.Ti2090
17:01:02.444 DEBUG ...smPostgresqlDumpfileWriter.cpp( 250) Wrote contents of section changesets
17:01:02.444 DEBUG ...smPostgresqlDumpfileWriter.cpp( 208) Flushing section current_nodes to file /tmp/qt_temp.XM2090
Error running convert:
std::bad_alloc

@bwitham - Let me know if you want to take a look at the server.

bwitham commented 7 years ago

I think I see what the problem is. I'll make the changes to the #1431 branch. It may take a couple more tries, but I think we'll get this working soon.

Cellington1 commented 7 years ago

Yeah, it looked better this time. Do you want me to test this on an in stance with lets say 60 or 120 GB RAM? Just to eliminate this being a hardware issue?

bwitham commented 7 years ago

If that's possible, in the meantime sure. I think with this next change I can get past that problem completely...found one more spot where the writes aren't being buffered.

Also, If you're still working on #1430 branch, set postgresql.dumpfile.writer.output.buffer.max.line.size=100000 On that branch its currently defaulted to one million, which may be a little high (on #1431 branch, the default is already 100,000). thanks.

Cellington1 commented 7 years ago

Will do. Thanks.

Cellington1 commented 7 years ago

Alright running 1430 again with 61 GB RAM and toe buffer size set manually. I will post progress.

bwitham commented 7 years ago

Its possible that the current memory leak is bad enough that even that amount won't be enough...not sure. I should have what I believe is a fix in the next hour or so posted to the #1431 branch.

bwitham commented 7 years ago

Pushed a possible fix to #1431. Going to try to test with some larger files today than I did previously (but not so large as the planet file) before going back to the pgbulk work, to see if I can find any other problems.

Cellington1 commented 7 years ago

I'll setup a test server with #1431.

Cellington1 commented 7 years ago

Testing with this command

hoot convert  -D changeset.user.id=1  -D osmapidb.bulk.writer.execute.sql=false -D osmapidb.bulk.writer.sql.output.file.copy.location=/tmpsql/planet.sql planet-latest.osm.pbf osmapidb://hoot:hoottest@localhost:5432/osmapi_test

Cellington1 commented 7 years ago

Currently looking good. Parsing ways now.

11:40:16.717 INFO  ...core/io/OsmApiDbBulkWriter.cpp( 660) Parsed 37,000,000 ways from input.

bwitham commented 7 years ago

If it gets far enough, post how many nodes, ways, relations, tags, etc. are in the file total (should print out a summary)....I'm curious.

bwitham commented 7 years ago

Cody, are you meaning to write your data to the 'osmapi_test' database, and then move that over to your final database destination? If so, you're ok.

If not, there's two potential problems you're going to have in this case. 1) hoot used osmapi_test to determine the starting IDs. Since that database was presumably empty, they're all going to be equal to 1. If your target db already has data in it, those IDs are going to be wrong when written. 2) you're going to have some problems in that hoot updates the ID sequences in the target db url just after generating the SQL file. So, if you're going to execute the sql file against another database, the ID sequences will never get written. You'd have to manually set each sequence to the last ID written for each table.

Cellington1 commented 7 years ago

I am trying to generate the sql files, then apply to an empty api database. Though I'm thinking we may be able to write directly to the database (on another server) depending on what Damian thinks.

Once the sql is ingested into the DB, we will run a script to set the sequence IDs. Then add indexes, primary and foreign keys. (this is the workflow that Terry used)

Cellington1 commented 7 years ago

Also status of the translation: <pre. Created stxxl 12:45:48.400 INFO ...core/io/OsmApiDbBulkWriter.cpp( 660) Parsed 170,000,000 ways from input.

bwitham commented 7 years ago

Ok, since you aren't writing to a live database your workflow should work fine.

But if you ever do end up writing to a live database, you have to let hoot handle the ID sequence update part internally. That's the only way we can guarantee ID safety in an online database. To do that, just supply a valid db URL to the command.

I'll go ahead and make the ID sequence update part optional, so hoot doesn't try to do it when someone wants to use the "offline" type of workflow that you're currently using.

Cellington1 commented 7 years ago

Sounds good, thanks.

bwitham commented 7 years ago

170 million ....I wonder how many it has?

The way part is usually going to be the slowest, b/c ways have the most tags (nodes usually have few or none) and also due to the number of way nodes that have to be parsed for each way.

Cellington1 commented 7 years ago

Yeah, generally ways process 10 times slower than nodes.

So we have a bit more than this maybe + 1%, but this is the state from the start of 2017:

nodes: 3,188,847,642 ways: 327,508,441 relations: 3.936,425

bwitham commented 7 years ago

Wow. Well, we're at least making progress against it.

brianhatchl commented 7 years ago

The live db update would never be used for an initial planet ingest. The use case is for adding say an additional country extract. Like @bwitham said, you wouldn't disable any foreign keys, indexes, etc or manually update sequences in the live db workflow.

bwitham commented 7 years ago

Right.

Also, I'm sure you already know most of them, Cody, but there are a bunch of other tricks in addition to what Brian listed you can do to speed up your offline write, as I was found out reading through the postgres documentation yesterday. Some of them would have no impact for us, since we're using COPY instead of INSERT, but others could make a noticeable difference.

Cellington1 commented 7 years ago

Correct. For this particular test, we are strictly looking at writing a full planet either to sql files or an empty database (Which would be offline).

bwitham commented 7 years ago

Yeah, I should have just kept quiet. I forgot that's what you were doing here...none of the online stuff applies for this.

bwitham commented 7 years ago

@Cellington1 Does the ID update script that you're going to run to update the record IDs in the element SQL files for the offline write expect the SQL setval statements to be in that file as well? Or are you going to exec those setval statements outside of that SQL file separately? Want to make sure I have them written to that SQL file for the offline workflow if that script is expecting them there.

Cellington1 commented 7 years ago

We will execute the setval statements with the scripts. Emailed you the specifics.

Cellington1 commented 7 years ago

@bwitham Getting close to finishing ways:

10:53:21.656 INFO ...core/io/OsmApiDbBulkWriter.cpp( 660) Parsed 314,000,000 ways from input.

Cellington1 commented 7 years ago

PBF to SQL conversion completed, but was unable to copy the 921 GB file. I may have to increase the storage space, currently have 1.7 TB available. I think we will need to bump up the volume to 2.5 TB.

20:32:35.954 INFO ...core/io/OsmApiDbBulkWriter.cpp( 357) Parsed 21,384,202,959 total SQL file lines. 20:32:36.031 INFO ...core/io/OsmApiDbBulkWriter.cpp( 240) Copying 921GB SQL output file to /tmpsql/planet.sql... 22:30:27.637 WARN ...core/io/OsmApiDbBulkWriter.cpp( 243) Unable to copy SQL output file to /tmpsql/planet.sql 22:30:27.637 INFO ...core/io/OsmApiDbBulkWriter.cpp( 224) Skipping SQL execution against database due to configuration... 22:30:27.637 INFO ...core/io/OsmApiDbBulkWriter.cpp( 225) Final SQL file write stats: 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Nodes: 3,780,985,254 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Node tags: 466,181,572 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Ways: 398,714,532 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Way nodes: 4,494,955,372 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Way tags: 1,468,784,859 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Relations: 4,839,264 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Relation members: 106,762,826 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Relation tags: 24,221,325 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Unresolved relations: 0 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Changesets: 83,690 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Changeset change size (each): 50,000 22:30:27.640 INFO ...core/io/OsmApiDbBulkWriter.cpp( 162) Executable SQL records: 21,490,973,698

bwitham commented 7 years ago

921GB ....wow. Not sure why the file didn't copy over...too big possibly? The SQL output file should still be in the /tmp folder somewhere (look for a 921GB file). In a newer version I'm using a file move instead of a copy, so you don't have to wait for it to copy over. I'll push that version to the 1431 branch later today. At least we're past the memory issues now.

Also, the next time you run try raising the file line output buffer size. That may increase the file write speed.

Cellington1 commented 7 years ago

Yeah crazy. So I looked for the temp files and they were gone. The total size of data on the server was only 40GB, so it must have deleted the temp sql files automatically.

bwitham commented 7 years ago

Yeah, that could have happened...sorry. I probably forgot to set them to not auto delete in that version. I've since fixed that. We can run through some smaller datasets after I updated 1431 again to weed out any remaining issues before trying the planet again.

Cellington1 commented 7 years ago

No worries, it makes sense to delete them. Sounds good. I am beefing up the server and setting up 3 voulmes and then will try the planet file again.

bwitham commented 7 years ago

Testing on this overall process continues, but the original memory problem has been fixed. Can open separate issues for any problems that remain.

ngageoint / hootenanny

dumpfile writer is dying on the planet pbf file #1430