royallgroup / TCC

The Topological Cluster Classification algorithm
https://royallgroup.github.io/TCC/
GNU General Public License v3.0
12 stars 5 forks source link

Why the cell-list is not selected by default? #137

Closed yangyushi closed 1 year ago

yangyushi commented 3 years ago

Constructing the cell-list significantly reduced the bond-calculation time, and it seems to be harmless. However such option is not selected by default. Is there any reason for this choice?

merrygoat commented 3 years ago

I don't remember exactly where I got to with the cell list. The whole TCC was originally hard coded to cubic boxes (fair enough since that was what it was used for initially) and so it had some cubic assumptions in it. I then added non-cubic boxes.

I can't remember if I fixed the cell list for non-cubic boxes. It might not crash but should be pretty obviously wrong if it is not working since it will overlap particles and give crazy numbers.

alexmalins commented 3 years ago

@merrygoat has made some good points.

Also for cell lists to work you need a system that is large enough. The side of a cubic box needs to be at least 4 times larger the maximum bond cut off length for their to be any benefit to using a cell list. If there's only 3 cell list sub-cubes along each box side (27 sub-cubes in total), the algorithm will still be calculating rij distances for all particles anyway so there is no benefit. If the system size isn't even 3 times larger than the max bond cut off, the cell list will fail as there needs to be at least 3 boxes along each dimension for it to work.

I guess therefore that having the cell-list turned off by default is a safe option, as the TCC should always run that way. However @yangyushi you are right that it is important to know to turn it on when you can, as it will speed up things a lot.

merrygoat commented 3 years ago

@alexmalins Yes - I remember that now. It's funny how these things come back when you are prompted! I did add a check for small boxes here it seems: https://github.com/royallgroup/TCC/blob/master/src/cell_list.c#L101

Going back through the commits I think this commit comment is why I turned it off by default: https://github.com/royallgroup/TCC/commit/86fc78140b9c65f0476ab1d69dac06a26c0e8cbd I don't think it changed the numbers much but it would be enough to fail the tests. In that commit it seems I added some sort of quicksort to the bond list - perhaps in an attempt to make it deterministic. Not sure where I got to with that though.

alexmalins commented 3 years ago

It's worrying that you found the cluster detections depended on the order of the bonds, I wouldn't have expected that. Maybe in cases when using the simple bond length procedure with a long cut-off. Particles can start having many bonds (>>20?) and long bonds mean the detection meaning breaks down/all bets are void. But I wouldn't have expected it to happen with Voronoi.

Instead of sorting the lists of bonds, maybe a simple check would to be to run one configuration through the TCC multiple times, randomizing the order of the particles in the xyz file. In fact, this would be a good sanity check of whether the TCC is behaving sensibly that could be run each time someone looks at a new system. It would be pretty easy to implement in the Python front end.

yangyushi commented 3 years ago

Dear @alexmalins and @merrygoat thanks for the answers and now it seems that the discussion is going to some other direction...

For people in the future who discovered this post: for my system containing 20k hard spheres with a non-cubic box, TCC with the cell list produces correct cluster populations (comparing with the no-cell-list version). The speed is about 5 times faster, which is very helpful for me.