kite-sdk / kite

Kite SDK
http://kitesdk.org/docs/current/
Apache License 2.0
394 stars 262 forks source link

Speed up schema migration when loading all schemas from a directory #349

Closed prazanna closed 9 years ago

prazanna commented 9 years ago

The following changes are made

  1. kicking off HbaseAdmin.createTableAsync for a bunch of tables and waiting for HBaseAdmin.isTableAvailable for all the tables is much quicker than the sync version HBaseAdmin.createTable sequentially for all the tables. The SchemaTool waits for a maximum of 10 minutes for all the tables to become available which seems like a very legible buffer.
  2. HBaseAdmin.disableTable and HBaseAdmin.enableTable are very costly (~25% time spent over the schema migration). Instead of creating tables as and when we know about a entity schema, construct the HTableDescriptor for all the entity schemas for a specific table, this was we dont have to add column families at a later point of time which requires the disable and enable
rdblue commented 9 years ago

Looks like this breaks the CDH4 and Hadoop 1 profiles. That's not necessarily a blocker because we need to discuss when to drop support for them. If that's the route you'd like to go, could you bring it up on the list? Otherwise, we should try to make this work for all profiles.

prazanna commented 9 years ago

I trust I have fixed the hadoop-1 and cdh4 profiles now. But it looks like Travis has hit some kind of transient error building the hadoop-1 profile.

prazanna commented 9 years ago

testValidMiniCluster(org.kitesdk.minicluster.TestMiniCluster) Time elapsed: 3.805 sec <<< ERROR! java.net.BindException: Port in use: localhost:50070.

Can someone with write access re-kick the travis build for this PR? Thanks.

rdblue commented 9 years ago

Kicked, thanks @prazanna!

rdblue commented 9 years ago

Just merged this as https://github.com/kite-sdk/kite/commit/acd0459095fb095971183394067f3208cfebac60

I forgot, but in the future could you make sure the commit message starts with a reference to a JIRA issue? Thanks!