Closed paciorek closed 4 months ago
@paciorek I think this makes sense. I think there was some awareness that makeModelDerivsInfo
might need efficiency improvements. I think you are suggesting:
nonWrtCalcNodeNames <- model$expandNodeNames(nonWrtCalcNodes)
nonWrtStochCalcNodeNames <- nonWrtCalcNodeNames[ model$isStoch(nonWrtCalcNodeNames) ]
nonWrtStochCalcNodeNames <- model$expandNodeNames(nonWrtStochCalcNodeNames, returnScalarComponents=TRUE)
That sounds reasonable.
Potentially deeper approaches would be to work with the declaration IDs or to get the graphIDs and use them in a more direct low-level way rather than working entirely through the model methods.
Yes that is what I was suggesting. I will implement that for now and check that it solves the problem in the carnivores case.
Similar issue occurs for isData
in makeModelDerivsInfo_impl
(actually, only the efficiency problem - memory explosion doesn't seem to occur for reasons I haven't investigated).
It ended up being a bit fiddly because in going back to nodes from components, it's possible one could reintroduce elements of wrtNodes
into the constantNodes
or updateNodes
, so I now have a step that excludes them again later on.
@perrydv I'll have you review the PR when I make it.
And further along in compilation processing, similar issue in calling isDeterm
on node components in makeOutputNodes
. Fixing that in similar fashion.
@paciorek Let me know if you want me to look at new code for this or to try fixing it directly.
I have fixes on a branch but haven't made a PR yet.
Fixed by #1435
In the process of working with a vectorized version of the carnivores MSOM code, I discovered the following.
I think this would be a general issue for HMC for any model with multivariate nodes with many elements. In the carnivore case, there are large deterministic mv nodes, such as
p[1:1451,1:30,1]
.In
makeModelDerivsInfo_impl
, we do this:Note that we call
isStoch
on node elements (scalar components) rather nodes.isStoch
callsexpandNodeNames
, which converts the node elements back to nodes, with many duplicated nodes, since each node element is replaced by its encompassing node. ThenisStoch
callsgetNodeType
, which callsmodelDef$nodeName2GraphIDs
. When that is called on a given node, we callparseEvalNumericManyList
, which returns a long vector of the graph IDs for every element of the given node. Of course that is being called many times on the same node because of the duplication mentioned above.To be concrete, suppose we have a node
p[1:1000,1:50]
that is innonWrtCalcNodes
. After we do the initialexpandNodeNames
, we have 50000 elements,p[1,1]
,p[1,2]
, etc. Then inisStoch
, we end up with 50000 duplicated values ofp[1:1000,1:5]
. Then when nodeName2GraphIDs and parseEvalNumericManyList are called, we have 50000 times (in anlapply
) in which we get the 50000 identical graphIDs for the variablep
, which are then passed throughunique
to get the single graphID forp[1:1000,1:50]
. I haven't fully determined the precise source of the memory explosion (>100 GB before I stopped the R process in the carnivores case) - presumably it's happening because of thelapply
, though I wouldn't necessarily expect all the graphIDs being manipulated would be in memory at the same time.I think the solution here is to run
model$isStoch
on the nodes rather than their scalar components and then doexpandNodeNames
later as needed inmakeModelDerivsInfo_impl
.Would like @perrydv 's perspective.