Random cell placement converges very slowly for large grids

lukaskiwitz commented 2 years ago

https://github.com/lukaskiwitz/thesis/blob/bcb89be2fc6cc1ea0f22646a70bbf9e35e079dbd/thesis/main/MyEntityLocator.py#L151-L208 @brunner-itb Although this works for larger grids, random_cluster it converges very slowly and takes longer than the meshing even for moderate sizes (250mu^3 and ccd 15mu; works better for large ccd) As it runs ones per scan sample, that's not gonna work very well in parameter scans. In addition, this is probably not deterministic, so if we reuse the mesh between simulations, but create new random placements, we might run into trouble.

brunner-itb commented 2 years ago

This converges very slowly for very dense grids. Here you attempt to put 4096 cells in 250µm^3, where most of the time about 4000 of those need to be corrected. But i already have an idea how to improve performance, working on it

lukaskiwitz commented 2 years ago

Excellent, I summoned you to this place! Yeah, I figured as much. It does work much better with lower density, but their is also kinda a conceptual issue with how we implemented this.

The cell placement may be different every time, but we will probably want to cache the mesh (otherwise boundary marking and compilation will really drag us down). So cell positions on the agent and in the mesh won't match. For uniformly distributed cell types that doesn't matter as long as the number of cells matches, because the only connection between agent and PDE model is the boundary pieces id. But for the clustering this might be a problem because we assign cell types based on cell positions.

This isn't a pressing issue though, the simple fix is to use small volumes and the remesh_scan_sample=True option, but for the "large" simulation at the end, this might spill trouble.

brunner-itb commented 2 years ago

Just pushed a new version with KDTree, significantly improved (timed it vs the distance_matrix).

About the conceptual problem, what is the agent you are talking about, probably our cell types right? Boundary pieces id's are not saved alongside the mesh? I would really like to avoid remesh_scan_sample = True as you mentioned. Not sure what I can do about it right now

lukaskiwitz commented 2 years ago

The parameters for the boundary conditions are mapped from the cell entities (agents) to the correct boundary piece through a value in solver.boundary_markers (which is saved alongside the mesh, of course). Getting the boundary piece index for the cells is currently just an enumeration of the cell list and because the grid placement was deterministic, the mapping between cell id and boundary piece index was repeatable. As a result the x,y,z values in cell.p would always match the physical location in the mesh. When the assignment of cell types is independent of x,y,z we don't notice the difference, but for clustering that probably won't work so well.

Our options I think, are:

fix the seed in the random placement (least effort, but we could not generate replicates over random (physical) placement, though still over cell type assignments)
store the cell positions with the mesh and load it together (cleanest solution, but greatest implementation effort)

If we do number one, we really wouldn't need to rerun the placement all the time, but if we cache the information, we might as well do No. 2.

lukaskiwitz / thesis

Random cell placement converges very slowly for large grids #9