Open kubagalecki opened 1 year ago
@kubagalecki @brian-kelley is working on an "easy" interface for on-node graph assembly. That might be a good choice for what you're trying to do. It also might not be what you want, if you're doing that all that work yourself anyway.
I would also add that the FECrsGraph contains data for any off-rank entries in off-rank rows you want to insert as well as the on-rank ones -- that's special sauce for the "FE" part. If you have all of that and you're sure you got it right, then a setAllIndices function would be what you want. And if you have all the ownedPlusShared stuff done in one big data structure, you can use the base class setAllIndices function. Which should work (though I suspect we don't test it so it might not).
Update: We don't test it and it doesn't work. We can look into it.
@csiefer2 thanks for getting back to me so quickly. The reason I need a FECrsGraph is that I want to construct a FECrsMatrix, which I happily use for off-node communication. However, I can more efficiently compute the graph (i.e. the owned and owned+shared row maps, the column map, and all column entries) in my own code. I just need setAllIndices
to efficiently pass that information to the FECrsGraph. I'd greatly appreciate if you could look into getting it to work. Should I submit a separate bug report?
In the mean time, calling insertLocalIndices
in parallel could really help me out. I'll ask the question in full for better google-ability:
Question: is insertLocalIndices
thread-safe, assuming all insertions are done in different rows, and sufficient memory has been allocated? Additionally we can assume the column entries passed are sorted and unique, but I don't think that's relevant for thread-safety.
Again, thank you for your help, I really appreciate it!
Question: is
insertLocalIndices
thread-safe, assuming all insertions are done in different rows, and sufficient memory has been allocated? Additionally we can assume the column entries passed are sorted and unique, but I don't think that's relevant for thread-safety.
Under the "no clashes within a row" assumption, probably. We don't test it though. You will need to enable thread safe RCPs if you're going to try to use them w/i an OpenMP parallel region.
@brian-kelley 's fast graph assembly stuff might also be an option. And that actually gets tested.
Filling an a priori known
FECrsGraph
Tagging @csiefer2 as the package owner
Hi, I have a performance question regarding the fill of a precomputed
FECrsGraph
. I have precomputed all the data for the fill of a sparse graph, including the column map and all the (locally indexed) column entries for each row. I would now like to pass that information on toFECrsGraph
. If I were creating aCrsGraph
, I'd just use the constructor taking the row pointers and column indices: https://github.com/trilinos/Trilinos/blob/d1f042da58a3e1de8bb198a5bfc3821608db625b/packages/tpetra/core/src/Tpetra_CrsGraph_decl.hpp#L486-L490Instead, since such a constructor is not available, I have to fill the graph row by row by calling
insertLocalIndices
, along the lines of:This results in a horrific performance hit, since,
insertLocalIndices
checks for duplicates etc. By horrific I mean it takes around 3 orders of magnitude more to insert the entries into the graph than it takes to compute it (i.e. precomputemy_graph
from the snippet above). Down the line, this means I'm taking more time initializing my algebraic problem than solving it. To be clear, I'm not knocking oninsertLocalIndices
, it's just not the right tool for the job.Ideally, I'd like to call
FECrsGraph::setAllIndices
. Unfortunately, when I do, I get the following error:My specific questions are as follows:
FECrsGraph
so that I can callsetAllIndices
on it?for
loop from the snippet above? This would alleviate the performance issue in a big way. I've tested it and it seems to work, but obviously multi-threading is not an area where "hey this seems to work" is a sufficient proof of correctness.FECrsGraph
similar to the one cited above forCrsGraph
? If it is possible from a technical perspective, but your team does not have the bandwidth to do it, would you please consider giving me some pointers so that I could contribute such a constructor myself?FECrsGraph
an "expert level" member functioninsertLocalRowSortedUniqueEntries
(or one with a less awful name), which would avoid checking for duplicates etc.? This would also resolve my performance problem, as it would essentially turn the above code snippet into a glorified memcpy. Again, I'd consider contributing if you don't have the bandwidth. For easy reference, the implementation details of insertion as they currently stand can be found below. As you can see it does all sorts of things to handle duplicates, including constructing an STL hashmap as a lookup table:https://github.com/trilinos/Trilinos/blob/d1f042da58a3e1de8bb198a5bfc3821608db625b/packages/tpetra/core/src/Tpetra_Details_crsUtils.hpp#L410-L506
Thanks in advance for your time, I appreciate the work y'all do!