Closed anbangx closed 10 years ago
It seems the only differences are to set the max memory for the stats job like I was saying and to set the read id set to actually dump to disk (I think this patch is already in one of the branches?)
Or was the problem in generating the histogram for the sorted counter min length? On Dec 19, 2013 8:19 PM, "anbangx" notifications@github.com wrote:
@JavierJia https://github.com/JavierJia @jakebiesingerhttps://github.com/jakebiesinger @Elmira88 https://github.com/Elmira88 @Nan-Zhanghttps://github.com/Nan-Zhang
When we try to run in the big data, there are some bugs. This PR fixed
them. Have a look!
You can merge this Pull Request by running
git pull https://github.com/uci-cbcl/genomix anbangx/fix-some-bugs-to-run-in-big-data
Or view, comment on, or merge it at:
https://github.com/uci-cbcl/genomix/pull/116 Commit Summary
- Fix some bugs in order to run in the big data
File Changes
- M genomix/genomix-data/src/main/java/edu/uci/ics/genomix/data/config/GenomixJobConf.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-0(3)
- M genomix/genomix-data/src/main/java/edu/uci/ics/genomix/data/types/ExternalableTreeSet.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-1(4)
- M genomix/genomix-driver/src/main/java/edu/uci/ics/genomix/driver/GenomixDriver.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-2(4)
- M genomix/genomix-hadoop/src/main/java/edu/uci/ics/genomix/hadoop/utils/ConvertToFasta.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-3(14)
- M genomix/genomix-hadoop/src/main/java/edu/uci/ics/genomix/hadoop/utils/GraphStatistics.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-4(27)
- M genomix/genomix-pregelix/src/main/java/edu/uci/ics/genomix/pregelix/base/BinaryVertexInputFormat.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-5(2)
- M genomix/genomix-pregelix/src/main/java/edu/uci/ics/genomix/pregelix/base/DeBruijnGraphCleanVertex.javahttps://github.com/uci-cbcl/genomix/pull/116/files#diff-6(6)
Patch Links:
@jakebiesinger , the problem was to set the count limit and initialize the manager before each job.
Otherwise the default countLimit of ReadHeadSet
is Interger.Max
. Then every readFields method will load from the frame instead of the hdfs files.
We call the setGlobalStaticConstants
when the job initializing.
Hi, @anbangx , I just saw this branch merged, but those comments didn't cover ?
@JavierJia I changed them based on your comments already. We tried to run as soon as possible, so I didn't send it again. Except this: +// static private int countLimit = Integer.MAX_
@JavierJia @jakebiesinger @Elmira88 @Nan-Zhang
When we try to run in the big data, there are some bugs. This PR fixed them. Have a look!