mhoffman / kmos

kMC on steroids: A vigorous attempt to make lattice kinetic Monte Carlo modelling easier
http://mhoffman.github.com/kmos/
GNU General Public License v3.0
53 stars 35 forks source link

Problem using model dimension of 1 in lat_int #52

Open mieand opened 8 years ago

mieand commented 8 years ago

Hey Max,

I would like to run a model with a dimension of just 1, but I see some unexpected behaviour in lat_int compared to local_smart.

In the standard CO oxidation on RuO2 example the following script:

from kmos.run import KMC_Model model = KMC_Model(print_rates=False, banner=False, size=[1,2]) model.do_steps(10)

runs fine in local_smart, but I get an error related to trying to remove a species that is not there in lat_int. It looks like, after running some steps, the processes that are listed in proclist gets out of sync with the occupations on the lattice.

I have tested a couple of models and two different versions of the development branch (one from June and one from February). All models show the same behaviour, but it varies how mane steps I can run before the error occurs (a few models actually look like they are running correctly).

In principle, I think the above example ought to work, since the cell is big enough that there are no second order processes that would involve the same site in different cells. Do you have any idea what could be going on with lat_int?

Best, Mie

mhoffman commented 8 years ago

Hi Mie

runs fine in local_smart, but I get an error related to trying to remove a species that is not there in lat_int. It looks like, after running some steps, the processes that are listed in proclist gets out of sync with the occupations on the lattice.

I really don't want to sound harsh. I am sure you have a point and I would like to help if possible. I am eternally grateful for some good bug report. However that is some super vague language in terms of what is actually going on. I am frankly not sure how I am supposed figure this one out in practice. It would be much more useful if it would contain the actually error message, backtrace, and ideally a minimal script that reproduces the observed and believed faulty behavior. Here is an essay on good bug reports for some context[1]

Thx.

[1] http://www.chiark.greenend.org.uk/~sgtatham/bugs.html

mhoffman commented 8 years ago

Hi

I noticed that in your snippet, you use the size=[1,2] argument. I think the problem is that many processes such as diffusion from cus->cus are not properly defined. I would have never expected that to work. Does the problem go away when you go to a slightly larger lattice size such as 2x2 or say 3x3?

Best, Max.

mieand commented 8 years ago

Hi Max,

For size=[1x2] it fails after about 10 steps, whereas for size=[2x2] it fails after about 400,000 steps.

Error message for size=[2x2]:

kmos/base/replace_species Tried to remove species from sites which is not there! Attempted replacement: 1-> 2 Found species: 2on site 6at step 403523 For a more human-readable error message, please run

in a python console

from kmos.run import KMC_Model model = KMC_Model(banner=False, print_rates=False) model.post_mortem(err_code=( 1, 2, 2, 6, 403523)) model.view()

(sometimes I just get a segmentation fault)

For size=[3x3] I don't get any errors up to at least 1e8 steps.

Could you just comment upon whether there are any reasons it shouldn't work for cells down to 1x2? This cell contains two sites in both the x and y direction (cus,br along x and cus,cus or br,br along y). Therefore you would never get into a situation where a diffusion process would attempt to diffuse to and from the same site (I don't understand what you mean with diffusion from cus->cus being "not properly defined").

This, and the fact that it works fine in local_smart, leads me to suspect a bug in lat_int. Do you agree? Anything I am overlooking? A next logical step would therefore be a more systematic debugging of lat_int. I can give it a try next week, but if you already have some intuition from your knowledge of the differences between these two branches, it would be most helpful.

Best, Mie

mhoffman commented 8 years ago

Hi Mie

oh, you are right, [1x2] does contain a neighboring cell in every direction. I can't think on top of my head which line in the code generator makes crash. However for example for CO@oxidation on RuO2(110) [1x2] and [2x2] are more kind of boundary cases since e.g. CO diffusion up and CO diffusion down are effectively (before & after) the same elementary processes but then we keep track of it in two places, so the order of operations may all of a sudden become important where it wasn't before. That might 'confuse' the backend somehow. I think I disregarded such small lattices since typically they induce significant fluctuations. Do you really think we need such small lattices?

Best, Max.

mieand commented 8 years ago

Hey Max,

The reason I would like to use such small dimensions is that my model catalyst is a stepped metal surface, which is enough 1D that it seems to be converged already at a dimension of 1 in the direction perpendicular to the step, i.e. modeling effectively only one step edge in the system.

In my acceleration algorithm the cost is proportional to the number of sites in the system. In the direction along the steps I have observed some unexpected finite-size effects, which means that I need a very large number of sites along this direction. It therefore becomes rather expensive, i.e. simulations need to run for days or even weeks, thus the speedup available from reducing the dimension perpendicular to the step would be greatly appreciated.

At the moment I don't have any lateral interactions in the system. The reason that I need to use the lat_int backend is that I am using a small trick/hack to analyse and fix some cases where the acceleration algorithm fails (still working on how to improve the algorithm in a more general way for these cases). For this hack I need the functionality allowing to use "OR" in the species field for conditions, which is not possible in local_smart (discussed in issue "local_smart does not allow for conditions with an OR in the species field #38"). I would actually very much prefer to use loal_smart, since it is a lot faster, so would be great if you should decide to implement this at some point. Like Juan, I gave up trying to implement it myself due to the complexity of this part of the code and fear of breaking something.

Comming back to the debugging of lat_int for small model dimensions: So, I think I have identified the problem. How to solve it is another issue. It looks indeed like you are right that the up/down diffusion is causing problems due to diffusion up/and down effectively being diffusion to the same site for a dimension of 2. The problem arises in the code for the local updating of which processes should be deleted and added in the fundtion "write_proclist_lat_int_run_proc" in "io.py". Say that the process just executed is O diffusion up from a cus site (O_diff_cus_up). The code then looks into the cus site where O used to sit and finds that O_diff_cus_up should be deleted. Next, it looks into the site above and below (which effectively are the same site). Here, it finds that O_diff_cus_up should be added for both of these sites. It thus effectively deletes the process once and adds it twice. Naturally, this is not good. In "base.f90" the value of "nr_of_sites(proc)" for O_diff_cus_up grows with one every time this process gets executed, which means that in the next step it is even more likely to get executed again, untill the point where nr_of_sites(proc) becomes larger than the total number of sites in the system. When that happens "determine_procsite(ran_proc, ran_site, proc, site)" might start trying to acces indices of "avail_sites" that are larger than its fixed dimension and then you run into weird returned values and segmentation faults.

I am not entirely sure how to fix this without breaking the code for larger dimensions. Could you think of any way of filtering the coordinates that are looked into in the code generator? If not, then I guess one would need to make the filtering at run time when the model dimension is known, i.e. make some checks that would capture if the different coordinates that are investigated are effectively the same as a consequence of the model dimension. Also, one would have to think of the efficiency of this, since these checks would have to be done in every kmc steps.

Please let me know if you have any ideas how to do this. At the very least lat_int should probably contain some warning or error when initiated with a low model dimension. The problem is that how low you can go depends on the processes and the range of their lateral interactions. I tested it for a model containing nearest neighbor lateral interactions, here the minimum working dimension seemed to be around 5-6.

Best, Mie