neuronsimulator / nrn

NEURON Simulator
http://nrn.readthedocs.io
Other
403 stars 118 forks source link

Another problem with datahandle #2458

Open nrnhines opened 1 year ago

nrnhines commented 1 year ago

This may be related to #2457 but I think it is different. Works for 8.2.2 (and earlier) but not master To reproduce:

git clone git@github.com:nrnhines/dcmdt.git
cd dcmdt
nrnivmodl
python3 -i dcdt.py

fails with

$ python3 -i dcdt.py
NEURON: CVode[0]::active: generic_data_handle{cont=capacitance cm row=0/2 type=double*}::literal_value<void*> cannot be called on a handle [that was] in modern mode
 in dcdt.ses near line 69
 }
  ^
        CVode[0].active(1)
      NumericalMethodPanel[0].iscvode()
    NumericalMethodPanel[0].restore(301, 1)
  xopen("dcdt.ses")
and others
NEURON: hoc_Load_file dcdt.ses
 in dcdt.ses near line 0
 ^
        load_file("dcdt.ses")
Traceback (most recent call last):
  File "/home/hines/models/dcmdt/dcdt.py", line 30, in <module>
    h.load_file("dcdt.ses")
RuntimeError: hocobj_call error: hoc_execerror: hoc_Load_file dcdt.ses
nrnhines commented 1 year ago

The problem (no solution yet) turns out to be explicitly handling a pointer (call it POINTER c} in a VERBATIM block via

  VERBATIM
  if (_p_c) {
  }
  ENDVERBATIM

I.e. a typical legacy block to determine if the pointer is valid. The mod file translator uses

#define _p_c _ppvar[0].literal_value<void*>()

When that is evaluated and _p_c points to a voltage, then the error is

RuntimeError: hocobj_call error: generic_data_handle{Node::field::Voltage row=1/3 type=double*}::literal_value<void*> cannot be called on a handle [that was] in modern mode

A more or less minimal example of this is

$ cat temp.mod
NEURON {
  SUFFIX temp
  THREADSAFE
  POINTER c
}

ASSIGNED {
  c
}

INITIAL {
  VERBATIM
  if (!_p_c) {
    printf("POINTER c not set\n");
  }
  ENDVERBATIM
}

and

$ cat temp.py
from neuron import h

s = h.Section()
s.cm = 1
s.insert("temp")
s(0.5).temp._ref_c = s(0.5)._ref_v

h.finitialize(-65)

execute with

nrnivmodl
python3 temp.py

If c is set to point to h._ref_t (i.e. s(0.5).temp._ref_c = h._ref_t, the error message changes to

RuntimeError: hocobj_call error: generic_data_handle{raw=0x7fa33377f3c0 type=double*} does not hold a literal value of type void*

If c is never set from python, then one does get the output

POINTER C not set

A user written pointer test is still useful, since

INITIAL {
  printf("POINTER c points to %g\n", c)
}

gives a segfault if the pointer is not set.

1uc commented 1 year ago

While reading the generated code I spot he following:

#define c   *_ppvar[0].get<double*>()
#define _p_c _ppvar[0].literal_value<void*>()

This looks surprising, because _ppvar[0].get<double*>() returns a double*, which is then dereferenced. Hence, (I'm again guessing a little) semantically in MOD c "point" to the value of something else; but c is the value (much like a C++ reference); and _p_c is the address of the thing that c "points to".

My first guess is that this is a code generation bug and it should have generated:

#define c   *_ppvar[0].get<double*>()
#define _p_c  _ppvar[0].get<double*>()

Further down there's code that suggests that we're using the SoA containers for mechanisms, i.e. I would guess that we'd just have a column with double * entries (one for each node where this mechanism is active). If it's an entry in a soa container, then this generic_data_handle doesn't contain a literal value, rather it behaves like a type erased (modern) data_handle, or to translate to the closest C equivalent _ppvar[0] is a void * and _ppvar[0].get<double*>() is just (double*) _ppvar[0]. The lines that make me believe it's an entry in a soa container are:

   _nrn_mechanism_register_data_fields(_mechtype,
                                       _nrn_mechanism_field<double>{"c1"} /* 0 */,
                                       ...
                                       _nrn_mechanism_field<double>{"_g"} /* 8 */,
                                       _nrn_mechanism_field<double*>{"c", "pointer"} /* 0 */);

I've locally edited the file x86_64/dcdt.cpp after it was generated; and reran nrnivmodl go generate a new special (I checked that x86_64/dcdt.cpp wasn't overwritten). When I ran

python -i dcdt.py

it didn't create a warning and didn't crash either. @nrnhines do you think my guesses are reasonable? If yes, I'll try to find out how to fix the code generation.

ramcdougal commented 1 year ago

I think you're correct about the semantics of _p because through NEURON 8.2.2, you'd get e.g.

#define on_init *_ppvar[2]._pval
#define _p_on_init _ppvar[2]._pval

where they're literally the same except one is dereferenced. I don't know why you'd ever have a void* stored in a data handle, I thought part of the purpose of the changes was to get rid of the use of void*.

In my code when I ran into this the other day, I simply changed my VERBATIM block's mention of _p_on_init to &on_init, which I honestly like better anyways and works with both 8.22 and 9.0a.

(I'd argue for getting rid of the _p variables entirely except that would probably break legacy VERBATIM blocks... although maybe they're already broken.)

1uc commented 1 year ago

a void* stored in a data handle

There's a subtlety here. The first is that data_handle and generic_data_handle are different objects. What we have here is a generic_data_handle. The important difference is that one of them is a template class: https://github.com/neuronsimulator/nrn/blob/21f16544c850132340d485d4cd7a581f5539d53a/src/neuron/container/data_handle.hpp#L60-L61

and the other isn't: https://github.com/neuronsimulator/nrn/blob/21f16544c850132340d485d4cd7a581f5539d53a/src/neuron/container/generic_data_handle.hpp#L36-L38

This is what I mean when I say that generic_data_handle is "type-erased". It doesn't know (at least not at compile time) the type of what it's a "handle" of. This is why we need to help it remember by typing:

generic_data_handle c_gdh = ...;
double c = *c_gdh.get<double*>();

You can peak inside and see the void * that enables the type-erasure here: https://github.com/neuronsimulator/nrn/blob/21f16544c850132340d485d4cd7a581f5539d53a/src/neuron/container/generic_data_handle.hpp#L294

Yes, that's a pointer to a soa<T> container, and not a row within that soa container. It it were just the address of a specific row in the soa container, then we'd be very close to the C analog of (double*) (m_container).

Edit: The above is only true if generic_data_handle doesn't store a "literal value". Which is the additional complexity of generic_data_handle: it acts much like a variant. However, I don't think we're dealing with a "literal value" here (and the program is failing because it claims we're not dealing with one). What we have is just the equivalent of a type-erased (modern) data_handle, because we're referring to a row in some soa container.

nrnhines commented 1 year ago

do you think my guesses are reasonable?

Yes.

But I think the SoA implementation of POINTER (like the previous implementations from inception of the POINTER keyword onward) isn't providing permutation invariance, reallocation invariance, and invalidation error messaging (except for itself! Not what it is pointing at.) that we now expect from data handles (at least for data handles to RANGE variables). Sounds like what is wanted is "POINTER is a data handle to a data handle". NMODL syntax, e.g.,

POINTER pre
...
    i = g*(vpre - v)

more or less was orginially designed to point to a range variable. In this case the user restricts them to point to a voltage at another location. Though it has an obvious use also for HOC scalar, or ultimately, any double, e.g. A Vector element.

Sadly, in VERBATIM blocks, POINTER was at hand to serve for pointer to Vector or Random123, or, indeed, any HOC Object or C struct. And that is why void was used in the old c translated mod files (where the user would cast at liberty in the VERBATIM code). And COREPOINTER got introduced to transfer struct data (again with user written VERBATIM block code) to and from CoreNEURON.

Going back to "POINTER is a data handle to a data handle". That is in tension with the desire for optimum performance. Perhaps after an explicit low performance implementation, it could (if worthwhile) be improved to have less indirection while frozen and get updated if any data handle data is moved, invalidated, permuted.

Actually, give my current level of understanding of data handle, it is not clear to me that POINTER cannot be considered as "POINTER is a data handle" and many instances of the mechanism mean it is just an extra std:vector associated with the mechanism.

ramcdougal commented 1 year ago

Sadly, in VERBATIM blocks, POINTER was at hand to serve for pointer to Vector or Random123, or, indeed, any HOC Object* or C struct

Or in my example, a Python callback... i.e. it need not be data per se, we can misuse it to pass around functions and just do the casting in a VERBATIM block. In the future, there should probably be a dedicated POINTER type for this, but that's beyond the scope of this issue.

1uc commented 1 year ago

I'm a lot more confused now, because it seems like POINTER hasn't been migrated to the soa container for mechanisms. Three reasons for this statement:

nrnhines commented 1 year ago

I looked at the translated dcdt.mod that began this issue and see that the last arg of

_nrn_mechanism_register_data_fields(_mechtype,

most of which are for RANGE variables, is

_nrn_mechanism_field<double*>{"c", "pointer"} /* 0 */);

So I surmise that all of the special types that used to be in the dparam as well as the standard RANGE doubles that used to be in param, are now organized in one place. But from the <legacy?> threadargs definitions, the non-doubles are still separated into the Datum* _ppval. As an aside, I notice that the hh.mod registration of ion variables follows the similar pattern. E.g.

_nrn_mechanism_field<double*>{"_ion_ek", "k_ion"} /* 3 */,
nrnhines commented 1 year ago

It seems to me now that the typical usage of POINTER, I.e, a double to a RANGE variable, is presently fully functional except for an error message if invalid. ```_ppvar[0].get<double>()is certainly a huge advantage over the pre SoA POINTER in being invariant under moving and reallocating what it points to. This issue could be closed with merely a version ofgetthat raises an error if it is invalid. E.g._ppvar[0].get_if_valid<double*>()```.

Then the explicit check I added in the mod file would not be needed.

Later on, for higher performance, the check could be avoided by a check for validity during initialize.

I believe that one of the ways CoreNEURON has higher performance is that all model data is in a single array and thus a pointer to any range variable outside of the mechanism instance is merely an integer. In particular this includes Node* variables (such as voltage) and ionic variables that are used by the mechanism instance. For the future, it seems this could be effectively introduced into NEURON but I first have to look into how this is currently handled for voltage.

1uc commented 1 year ago

(Messages are crossing; but unfortunately I think the situation might be slightly more complicated. Hence I'm still posting despite the issue being "resolved".)

I think I understand why the generated code uses literal_value. Unfortunately, it's important to understand the hybrid nature of generic_data_handle. The simple version is that it can either store a raw pointer; or a data_handle. If it stores a raw pointer, the raw pointer is simply stored in the generic_data_handle. If it represents a type-erased data_handle then it stores enough information to create a genuine data_handle<T> on the fly.

When creating a generic_data_handle from a pointer, it'll perform a search to see if this address at that time refers to any element in any (known) soa container. If that's the case, then the generic_data_handle will create a modern data_handle and store all required information to recreate that data handle when needed. This includes storing the "stable identifier" of the element that the pointer points to.

Example, if a mechanism has a c pointer and in HOC we set c to point to the voltage of the first segment, then when performing:

_ppval[0] = &v_first_segment;

one will find that the address &v_first_segment lives inside a soa container. It'll obtain a (non-owning) stable identifier of v_first_segment, and store it along with some additional information in _ppval[0]. If at a later point we ask for c, then it'll create the data handle and virtue of the stable identifier, find the correct value of c (even if the soa has been shuffled or reallocated).

So far it behaves exactly like we initially guessed. Interestingly, under these conditions the POINTER is stable under permutations. I've added a little test that "proves" as much, see #2466.

However, there's a twist. Namely code that assigns to _p_c in a verbatim block. We'll use the address of printf just because I hope this might be a reasonably proxy for what @ramcdougal does with python callbacks:

VERBATIM
   _p_c = &printf; 
ENDVERBATIM

VERBATIM
  _p_c("foo%d", 42);
ENDVERBATIM

If we let _p_c == _ppval[0].get<double*>() then

_p_c  = &printf;

expands to

_ppval[0].get<double*>() = &printf;

Since get returns by value, i.e. it's signature is double * generic_data_handle::get<double*>(); we'd be assigning &printf to an rvalue. This fails to compile.

This is where literal_value comes in. From the documentation:

To match the flexibility of the old Datum, this had to be augmented to allow small literal values (int, double,void*, …) to be stored inside it and accessed via either get<T>() (if only the value is required) or literal_value<T>() if, even worse, the address of the wrapped value needs to be taken.

One imporant difference is that literal_value returns a non-const reference to the value that's stored in the generic_data_handel as a 'literal value'. However, that address is meaningless if the generic_data_handle represents a modern data_handle. Essentially, if _ppval[0] refers to a raw pointer and we let _p_c = _ppval[0].literal_value<void*>() then

_ppval[0].literal_value<void*>() = &printf;

correctly stores the address of printf inside _ppval[0]. However, then encounter the issue first reported here.

olupton commented 1 year ago

I have not fully parsed if there are any outstanding questions here, but I thought I would chime in with a few thoughts/comments/observations:

POINTER variables are used in two distinct ways. With reference to the generated code:

#define c   *_ppvar[0].get<double*>()
#define _p_c _ppvar[0].literal_value<void*>()

As has I think been noted, if (as in the first case) the POINTER variable is referring to something (like a voltage) in the model in a permutation-stable way then c is the pointed-to voltage value, and &c is a non-null pointer (double*), but that pointer is generally calculated on the fly -- it is not stored anywhere in the model data. So, for this case then _p_c and &c could be made to yield the same pointer value, but &_p_c would be invalid -- thus breaking the other use-case.

It's unfortunate that POINTER has this overloaded meaning. In #2027 I concluded that, while the two cases are both common, mixing usage of c and _p_c for the same POINTER variable c in the same MOD file did not happen. I believe this is true in the models covered by the test suite and nrn-modeldb-ci, but it seems from the above that counterexamples exist. In my view, the second use-case (via _p_c) should be handled differently, essentially as a RANGE variable of non-double type -- loosely void*, but ideally this could be done in a more type-safe manner.

...
  RANGE my_ptr [MyStruct*]
...
  my_ptr = new MyStruct;
...
  *my_ptr; // MyStruct&

In any case, literal_value is a transition measure that should be consigned to the history bin as soon as possible. I wrote about this in https://nrn.readthedocs.io/en/latest/dev/data-structures.html#eliminate-the-need-for-literal-value.

At the risk of adding to the debt, I don't think there would be any technical barrier to adding _p_valid_c, yielding a bool saying whether c points to a valid part of the SoA data, or if it wraps a non-null raw pointer (thus covering both of the cases above).

nrnhines commented 1 year ago

mixing usage of c and _p_c for the same POINTER variable c in the same MOD file did not happen.

That kind of mixing would definitely be a bug. My use of if (_p_c) { was just a naive attempt to generate a recoverable error if I fail to set the data handle from the interpreter instead of getting a segfault.

It's unfortunate that POINTER has this overloaded meaning.

That is certainly true. The nmodl translator correctly disambiguates, I believe, just by noting if c is used in an expression or assignment statement. If so it must be a datahandle to a double. The only way it can be used to point to an Object is within VERBATIM blocks where the _p_c syntax is used to set or get a raw pointer.

It seems that #define c *_ppvar[0].get<double*>() requires that either get is changed to raise an error if the data handle is invalid, or that the translator is changed to use a method name that does raise an error (and pay the cost of the validity check).

Nevertheless, I do like the idea of new keywords/syntax that avoids the overloading.

olupton commented 1 year ago

The nmodl translator correctly disambiguates, I believe, just by noting if c is used in an expression or assignment statement. If so it must be a datahandle to a double.

OK, I had assumed that this sort of introspection was beyond nocmodl's capabilities, and that it would need the modern NMODL translator.

I would be inclined to modify the code generation to take advantage of this knowledge, and to generate, effectively, a void*-typed RANGE variable if _p_X is being used.

It seems that #define c _ppvar[0].get<double>() requires that either get is changed to raise an error if the data handle is invalid, or that the translator is changed to use a method name that does raise an error (and pay the cost of the validity check).

In the short term, I suppose that this is the simplest way forward.

(apologies, I hit the wrong button and posted prematurely)