uci-cbcl / genomix

Parallel genome assembly using Hyracks
3 stars 2 forks source link

Fix some bugs in order to run in the big data #116

Closed anbangx closed 10 years ago

anbangx commented 10 years ago

@JavierJia @jakebiesinger @Elmira88 @Nan-Zhang

When we try to run in the big data, there are some bugs. This PR fixed them. Have a look!

jakebiesinger commented 10 years ago

It seems the only differences are to set the max memory for the stats job like I was saying and to set the read id set to actually dump to disk (I think this patch is already in one of the branches?)

Or was the problem in generating the histogram for the sorted counter min length? On Dec 19, 2013 8:19 PM, "anbangx" notifications@github.com wrote:

@JavierJia https://github.com/JavierJia @jakebiesingerhttps://github.com/jakebiesinger @Elmira88 https://github.com/Elmira88 @Nan-Zhanghttps://github.com/Nan-Zhang

When we try to run in the big data, there are some bugs. This PR fixed

them. Have a look!

You can merge this Pull Request by running

git pull https://github.com/uci-cbcl/genomix anbangx/fix-some-bugs-to-run-in-big-data

Or view, comment on, or merge it at:

https://github.com/uci-cbcl/genomix/pull/116 Commit Summary

  • Fix some bugs in order to run in the big data

File Changes

  • M genomix/genomix-data/src/main/java/edu/uci/ics/genomix/data/config/GenomixJobConf.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-0(3)
  • M genomix/genomix-data/src/main/java/edu/uci/ics/genomix/data/types/ExternalableTreeSet.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-1(4)
  • M genomix/genomix-driver/src/main/java/edu/uci/ics/genomix/driver/GenomixDriver.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-2(4)
  • M genomix/genomix-hadoop/src/main/java/edu/uci/ics/genomix/hadoop/utils/ConvertToFasta.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-3(14)
  • M genomix/genomix-hadoop/src/main/java/edu/uci/ics/genomix/hadoop/utils/GraphStatistics.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-4(27)
  • M genomix/genomix-pregelix/src/main/java/edu/uci/ics/genomix/pregelix/base/BinaryVertexInputFormat.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-5(2)
  • M genomix/genomix-pregelix/src/main/java/edu/uci/ics/genomix/pregelix/base/DeBruijnGraphCleanVertex.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-6(6)

Patch Links:

JavierJia commented 10 years ago

@jakebiesinger , the problem was to set the count limit and initialize the manager before each job. Otherwise the default countLimit of ReadHeadSet is Interger.Max. Then every readFields method will load from the frame instead of the hdfs files.

We call the setGlobalStaticConstants when the job initializing.

JavierJia commented 10 years ago

Hi, @anbangx , I just saw this branch merged, but those comments didn't cover ?

anbangx commented 10 years ago

@JavierJia I changed them based on your comments already. We tried to run as soon as possible, so I didn't send it again. Except this: +// static private int countLimit = Integer.MAX_