lantunes / cellpylib

A library for working with Cellular Automata, for Python.
https://cellpylib.org
Apache License 2.0
228 stars 31 forks source link

[JOSS] Questions #11

Closed szhorvat closed 2 years ago

szhorvat commented 3 years ago

This is not a bug report, just a means to ask questions @lantunes, for the purpose of the JOSS review. I hope it is okay to do it here. If not, can you please point me to a better location (perhaps there's a support forum)?

szhorvat commented 3 years ago

Q1: I am wondering if there is a simple way to use random sequential update order. I.e., select a random cell, update it, then another cell, update it, and so on. I know that a specific deterministic update order can be specified, but I am looking for a random one. Such an update would be used e.g. for an Ising model or the sandpile model.

Q2: Is it possible to have the update of a cell also affect its neighbours? Normally, the "updated" cell will change. There are models where it neighbours might change as well, for example the sandpile mode.

Generally, are you targeting stochastic cellular automata or only deterministic ones?

szhorvat commented 3 years ago

Q3: What is the show_margins argument of plot2d_animate for? If using it in a Jupyter notebook, show_margins=False causes the figure to be output as a vector image whose width is the same as the browser window. If I use ipython in a terminal, it seems to result in an empty figure. It is not clear to me what the expected effect is.

szhorvat commented 3 years ago

Q4: I see no effects from the scale and dpi arguments of plot2d_animate in either a Jupyter notebook (tried %matplotlib inline and %matplotlib notebook) or ipython terminal (backend: MacOSX). Can you please clarify?

lantunes commented 3 years ago

This is not a bug report, just a means to ask questions @lantunes, for the purpose of the JOSS review. I hope it is okay to do it here. If not, can you please point me to a better location (perhaps there's a support forum)?

@szhorvat Thank you very much for the questions! Unfortunately, there isn't a dedicated forum for this library at the moment. If usage increases, I will create a Google group, but for now, creating an issue here is the best way.

szhorvat commented 3 years ago

Have you considered GitHub Discussions? I have not used it so far, just wanted to bring it to your attention in case you didn't know about it.

I think it's fine to use the issue tracker for this while the project is small, just make it clear in the README that this is where support questions should go, so I can tick off this box in the review.

lantunes commented 3 years ago

@szhorvat Thanks for the heads up about GitHub Discussions. I hadn't head of it, and I will consider it going forward. I have added a Support section to the README file. The change is on the joss_review branch. The relevant commit is cab02b730d026a4ee7765a287bddc25aedf1a4de.

lantunes commented 3 years ago

With respect to Q1, I have added an example of how to update cells in a random sequential order. Have a look at the joss_review branch, at the demos/async_random_demo.py file. I have also updated the documentation, and fixed a bug in AsynchronousRule related to 2D support, that this change revealed. The relevant commit is d34285a0aef1e4fac7cf59591f13458b7f4133f6.

The basic idea is to wrap a rule with the AsynchronousRule, and set the randomize_each_cycle flag to True. Recall that when a rule is made asynchronous, then only a single cell will be updated (i.e. have the rule applied) in a timestep (all other cells will remain unchanged). The cell is chosen according to an update order. If randomize_each_cycle is True, then that update order is shuffled randomly at the end of each timestep. The result is the same as selecting a cell randomly at each timestep, and updating it.

lantunes commented 3 years ago

With respect to Q2, my understanding is that a Sandpile is a regular 2D cellular automaton. That is, the update rule is applied synchronously to all cells. Thus, the neighbours of a cell are affected by an update to the cell, but I don't believe there is any kind of asynchronous update scheme in the Sandpile model. I have added a built-in Sandpile rule on the joss_review branch. (The relevant commit is 045a209ddd86376ab23eaad30dc568bdca210d46.) Hopefully it will aid in this discussion. In any case, I suspect that the kind of model you may be describing could be implemented with an AsynchronousRule.

The library should be able to support either stochastic or deterministic automata.

lantunes commented 3 years ago

With respect to Q3, the show_margin argument controls whether or not a margin is displayed in the resulting plot. This may be desirable when creating animations that are meant to be saved and published. The Wireworld demos are currently the only places where this argument is used. When show_margin is set to False, then the plot takes up the entirety of the window. The scale argument, which is only used when show_margins is False, determines the resulting size of the image.

lantunes commented 3 years ago

With respect to Q4, the scale argument is only used when the show_margin argument is False. It controls the resulting scale (i.e. relative size) of the image when there are no margins. The dpi argument represents the dots per inch of the animation when it is saved. There will be no visible effect of the dpi argument if the animation is not saved (i.e. when save is False).

I have added more information to the "plot2d" function docstrings to clarify. The change is on the joss_review branch. The relevant commit is f91e83fd34ef716a5f5808031535a97c9a4f1133.

szhorvat commented 2 years ago

With respect to Q2, my understanding is that a Sandpile is a regular 2D cellular automaton. That is, the update rule is applied synchronously to all cells. Thus, the neighbours of a cell are affected by an update to the cell, but I don't believe there is any kind of asynchronous update scheme in the Sandpile model. I have added a built-in Sandpile rule on the joss_review branch. (The relevant commit is 045a209.) Hopefully it will aid in this discussion. In any case, I suspect that the kind of model you may be describing could be implemented with an AsynchronousRule.

Yes, you are right, we can think of it as synchronous, though perhaps this is not the most efficient way to simulate it.

I am wondering what the best way would be to do the following:

  1. Add a grain of sand at the middle (alternative: at a random location)
  2. Simulate the system until it reaches a fixed point (nothing changes anymore). Is there a way to detect a fixed point and automatically stop the simulation?
  3. Repeat from step 1.

What would you recommend to do this with a reasonable performance?

lantunes commented 2 years ago

@szhorvat I have made some changes to the library to support what you describe with sandpiles. The relevant commit is 70aa94e4089c7beb0c977177daf83eb52b7de995.

In summary, I have made the timesteps parameter of the evolve and evolve2d functions be either an int or a callable. If one knows the number of timesteps in advance, an int can be supplied. If one doesn't know the number of timesteps in advance, then a callable can be supplied. This callable accepts the full history of the CA evolution up to the point when it is called, along with the number of the current timestep. The callable must return False if evolution is to halt, and True otherwise.

This change allows one to detect when a fixed point has been reached. Also, adding a grain of sand to a particular cell at a particular timestep should be trivial. I have added convenience functions to the Sandpile class to perform these operations. See the Sandpile.until_fixed_point() and Sandpile.add_grain() functions. I have also updated the documentation for the Sandpile tutorial to illustrate these capabilities.

szhorvat commented 2 years ago

See the Sandpile.until_fixed_point()

I am wondering why this was made part of the Sandpile class, since it's a generally useful function, not tied in any way to sandpiles.

Also, to be clear, I am not expecting you to implement everything I asked about as part of the library. That would of course be unreasonable. I am trying to judge how usable the library is for arbitrary CA-related tasks, some of which you may not have anticipated. The sandpile model and the Ising model were two such CAs.

szhorvat commented 2 years ago

Also, adding a grain of sand to a particular cell at a particular timestep should be trivial.

My concern is: Is there a performance difference between calling evolve2d with 1 timestep n times vs calling evolve2d with n timesteps once? If not, then adding grains is indeed trivial.

Generally, the current sandpile implementation seems to be rather slow. I am wondering how one would go about generating an image like this one (but of course smaller): https://en.wikipedia.org/wiki/Abelian_sandpile_model#/media/File:Sandpile_on_infinite_grid,_3e7_grains.png

Here's my try:

n = 50
sandpile = cpl.Sandpile(n, n)
ca = cpl.init_simple2d(n, n, val=5)

for i in range(100):
    ca=np.array([ca[-1]])
    ca[0,n//2,n//2]=5
    ca=cpl.evolve2d(ca, apply_rule=sandpile, timesteps=cpl.Sandpile.until_fixed_point(), neighbourhood='von Neumann')

cpl.plot2d(ca)
image

This is unfortunately incredibly slow, taking multiple minutes, despite generating a tiny result.

Would this code be how you envisioned people to use the library, or do you have recommendations for changes?

szhorvat commented 2 years ago

This brings me to Q5:

What is the recommended way to construct a continuous series of CA states if calling evolve / evolve2d multiple times? Suppose I create an initial configuration ca, which will be an 1-by-n array (for a 1D CA). I let it evolve for k time steps, getting a k+1-by-n result. Now I modify the last state manually, and I would like to continue the evolution. If I call evolve again, it will take ca[0] as the starting state and discard everything else.

What is the recommended way to continue the evolution from the last recorded state, and retain all previously computed states (e.g. for an animation in 2D)?

lantunes commented 2 years ago

@szhorvat In terms of the performance, given the scenario you describe, I expect it would be slower to evolve the CA for 1 timestep n times, since there is initialization and allocation of an array each time, but I'm not sure how much slower. It all depends on what you're trying to do. The Sandpile class provides support for adding a grain at particular locations and timesteps over the course of a single evolution, but that assumes you know on what timesteps you'd like to do the addition. If it's not obvious what those timesteps are, then yes, you would have to evolve the CA until it meets some condition, and then restart the evolution after adding the grain. I hope I am following correctly, and this answers your question.

It's true that the library in general is not optimized for speed. For your particular example, I tried running it on my machine, and it required 159 seconds. I tried using a Moore neighbourhood instead of a von Neumann neighbourhood, and it completed in 32 seconds. The reason is that I use a Numpy masked array when the neighbourhood is von Neumann, and this makes things considerably slower. However, 32 seconds is still not ideal. The problem is that there is a trade-off to be made between library generality (i.e. offering flexibility in terms of what kinds of models may be implemented) and performance (the amount of time required for the system to evolve). A CA may be given a very fast implementation if we know the rule in advance, and we can optimize the operations for the rule in question. Even so, there are ways that one may make CellPyLib faster, such as by incorporating parallelization. (Though a complicating factor is that a rule can be any arbitrary callable.) But this is out of scope for the current release, and it will be the focus of development for future releases.

Regarding Q5, I have provided an example of what I believe you describe:

import cellpylib as cpl
import numpy as np

n = 50
sandpile = cpl.Sandpile(n, n)
history = [cpl.init_simple2d(n, n, val=5)[-1]]

for i in range(100):
    initial = np.array([history[-1]])
    initial[0, n//2, n//2] = 5
    ca = cpl.evolve2d(initial, apply_rule=sandpile, timesteps=cpl.Sandpile.until_fixed_point(), neighbourhood='Moore')
    history.extend(ca)

cpl.plot2d_animate(np.array(history))

Basically, I keep a separate list, history, that I will extend after each call to evolve2d. Is this an accurate interpretation of the process you describe?

lantunes commented 2 years ago

I am wondering why this was made part of the Sandpile class, since it's a generally useful function, not tied in any way to sandpiles.

Good point! I have moved the until_fixed_point function out of the Sandpile class. The relevant commit is a3c23e04405f8ab38184c2d57fc630c0417c7658.

szhorvat commented 2 years ago

Re Q5 above:

I wonder why not just let evolve2d continue from the last timestep of its input array instead of from the first, and return both the prior timesteps provided as input, as well as the newly computed timesteps? What do you think?

lantunes commented 2 years ago

@szhorvat This would perhaps be a better design for the operation of the evolve and evolve2d functions. Unfortunately, I've just produced a release of the software. However, this change can certainly be made for the next release. I will create a PR with this change (assuming everything works as envisioned), and I will add you as a reviewer.

lantunes commented 2 years ago

@szhorvat I have created #25 to implement this change.

lantunes commented 2 years ago

@szhorvat I have created pull request #26 to address #25 and implement support for initial conditions with a prior history. Please have a look if you like, and thank you for the suggestion!