Open Nan-Zhang opened 11 years ago
TreeMap.iterator shouldn't be called at all(?) in graph-build...
@anbangx @jakebiesinger please have a look on this numbers.
Do pages like this make more sense on the wiki? I want to make inline comments on these numbers. I could edit the entries directly and make the comments inline but that seems silly.
AggregateKmer init
? That's a huge chunk of build time.Hyracks graphbuilding
, java.util.TreeMap$EntrySet.iterator()
is mostly called in serialization. Easy enough to cache the length and use a dirty bit to see if it needs to be recalculated.edu.uci.ics.genomix.type.EdgeMap.unionUpdate(EdgeMap)
use addAll or something? Seems a waste to have to iterate here when the types are the same.PathMergeMessage.readFields
and VertexValueWritable.readFields
go away. Also, I'm not sure if we really need to use setAsCopy
when merging in other node's EdgeMap
's (easing Node.mergeEdges
). Maybe we can get away with more references, especially in our incoming messages (is the value returned by msgIterator.next()
really something we have to make a new copy of? Can we get away with copying out only what we need from it if it is a non-reusable reference? @anbangx can you investigate?Configuration.get
. Can we move those callers so they're only called in the first iteration, @anbangx ?I will investigate the topics you mentioned, but move callers only in first iteration seems not work
@jakebiesinger , maybe we can try this? http://stackoverflow.com/questions/6941688/how-to-integrate-a-github-wiki-into-the-main-project
Actually, there's a wiki already available that you can use (it's really another git repo). Ours is turned off right now but is easy to add. No need to make it a submodule unless you want to edit it locally rather than using the github interface.
@anbangx to help.