nimble-dev / nimble

The base NIMBLE package for R
http://R-nimble.org
BSD 3-Clause "New" or "Revised" License
156 stars 23 forks source link

severe inefficiency and memory explosion for certain models in `makeModelDerivsInfo` because `isStoch` applied to node scalar components #1431

Closed paciorek closed 4 months ago

paciorek commented 6 months ago

In the process of working with a vectorized version of the carnivores MSOM code, I discovered the following.

I think this would be a general issue for HMC for any model with multivariate nodes with many elements. In the carnivore case, there are large deterministic mv nodes, such as p[1:1451,1:30,1].

In makeModelDerivsInfo_impl, we do this:

  nonWrtCalcNodeNames <- model$expandNodeNames(nonWrtCalcNodes, returnScalarComponents = TRUE)
  nonWrtStochCalcNodeNames <- nonWrtCalcNodeNames[ model$isStoch(nonWrtCalcNodeNames) ]  

Note that we call isStoch on node elements (scalar components) rather nodes. isStoch calls expandNodeNames, which converts the node elements back to nodes, with many duplicated nodes, since each node element is replaced by its encompassing node. Then isStoch calls getNodeType, which calls modelDef$nodeName2GraphIDs. When that is called on a given node, we call parseEvalNumericManyList, which returns a long vector of the graph IDs for every element of the given node. Of course that is being called many times on the same node because of the duplication mentioned above.

To be concrete, suppose we have a node p[1:1000,1:50] that is in nonWrtCalcNodes. After we do the initial expandNodeNames, we have 50000 elements, p[1,1], p[1,2], etc. Then in isStoch, we end up with 50000 duplicated values of p[1:1000,1:5]. Then when nodeName2GraphIDs and parseEvalNumericManyList are called, we have 50000 times (in an lapply) in which we get the 50000 identical graphIDs for the variable p, which are then passed through unique to get the single graphID for p[1:1000,1:50]. I haven't fully determined the precise source of the memory explosion (>100 GB before I stopped the R process in the carnivores case) - presumably it's happening because of the lapply, though I wouldn't necessarily expect all the graphIDs being manipulated would be in memory at the same time.

I think the solution here is to run model$isStoch on the nodes rather than their scalar components and then do expandNodeNames later as needed in makeModelDerivsInfo_impl.

Would like @perrydv 's perspective.

perrydv commented 6 months ago

@paciorek I think this makes sense. I think there was some awareness that makeModelDerivsInfo might need efficiency improvements. I think you are suggesting:

nonWrtCalcNodeNames <- model$expandNodeNames(nonWrtCalcNodes)
nonWrtStochCalcNodeNames <- nonWrtCalcNodeNames[ model$isStoch(nonWrtCalcNodeNames) ]  
nonWrtStochCalcNodeNames <- model$expandNodeNames(nonWrtStochCalcNodeNames, returnScalarComponents=TRUE)

That sounds reasonable.

Potentially deeper approaches would be to work with the declaration IDs or to get the graphIDs and use them in a more direct low-level way rather than working entirely through the model methods.

paciorek commented 6 months ago

Yes that is what I was suggesting. I will implement that for now and check that it solves the problem in the carnivores case.

paciorek commented 6 months ago

Similar issue occurs for isData in makeModelDerivsInfo_impl (actually, only the efficiency problem - memory explosion doesn't seem to occur for reasons I haven't investigated).

It ended up being a bit fiddly because in going back to nodes from components, it's possible one could reintroduce elements of wrtNodes into the constantNodes or updateNodes, so I now have a step that excludes them again later on.

@perrydv I'll have you review the PR when I make it.

paciorek commented 6 months ago

And further along in compilation processing, similar issue in calling isDeterm on node components in makeOutputNodes. Fixing that in similar fashion.

perrydv commented 5 months ago

@paciorek Let me know if you want me to look at new code for this or to try fixing it directly.

paciorek commented 5 months ago

I have fixes on a branch but haven't made a PR yet.

paciorek commented 4 months ago

Fixed by #1435