Open nrnhines opened 1 year ago
The problem (no solution yet) turns out to be explicitly handling a pointer (call it POINTER c} in a VERBATIM block via
VERBATIM
if (_p_c) {
}
ENDVERBATIM
I.e. a typical legacy block to determine if the pointer is valid. The mod file translator uses
#define _p_c _ppvar[0].literal_value<void*>()
When that is evaluated and _p_c points to a voltage, then the error is
RuntimeError: hocobj_call error: generic_data_handle{Node::field::Voltage row=1/3 type=double*}::literal_value<void*> cannot be called on a handle [that was] in modern mode
A more or less minimal example of this is
$ cat temp.mod
NEURON {
SUFFIX temp
THREADSAFE
POINTER c
}
ASSIGNED {
c
}
INITIAL {
VERBATIM
if (!_p_c) {
printf("POINTER c not set\n");
}
ENDVERBATIM
}
and
$ cat temp.py
from neuron import h
s = h.Section()
s.cm = 1
s.insert("temp")
s(0.5).temp._ref_c = s(0.5)._ref_v
h.finitialize(-65)
execute with
nrnivmodl
python3 temp.py
If c is set to point to h._ref_t
(i.e. s(0.5).temp._ref_c = h._ref_t
, the error message changes to
RuntimeError: hocobj_call error: generic_data_handle{raw=0x7fa33377f3c0 type=double*} does not hold a literal value of type void*
If c is never set from python, then one does get the output
POINTER C not set
A user written pointer test is still useful, since
INITIAL {
printf("POINTER c points to %g\n", c)
}
gives a segfault if the pointer is not set.
While reading the generated code I spot he following:
#define c *_ppvar[0].get<double*>()
#define _p_c _ppvar[0].literal_value<void*>()
This looks surprising, because _ppvar[0].get<double*>()
returns a double*
, which is then dereferenced. Hence, (I'm again guessing a little) semantically in MOD c
"point" to the value of something else; but c
is the value (much like a C++ reference); and _p_c
is the address of the thing that c
"points to".
My first guess is that this is a code generation bug and it should have generated:
#define c *_ppvar[0].get<double*>()
#define _p_c _ppvar[0].get<double*>()
Further down there's code that suggests that we're using the SoA containers for mechanisms, i.e. I would guess that we'd just have a column with double *
entries (one for each node where this mechanism is active). If it's an entry in a soa
container, then this generic_data_handle
doesn't contain a literal value, rather it behaves like a type erased (modern) data_handle
, or to translate to the closest C equivalent _ppvar[0]
is a void *
and _ppvar[0].get<double*>()
is just (double*) _ppvar[0]
. The lines that make me believe it's an entry in a soa
container are:
_nrn_mechanism_register_data_fields(_mechtype,
_nrn_mechanism_field<double>{"c1"} /* 0 */,
...
_nrn_mechanism_field<double>{"_g"} /* 8 */,
_nrn_mechanism_field<double*>{"c", "pointer"} /* 0 */);
I've locally edited the file x86_64/dcdt.cpp
after it was generated; and reran nrnivmodl
go generate a new special
(I checked that x86_64/dcdt.cpp
wasn't overwritten). When I ran
python -i dcdt.py
it didn't create a warning and didn't crash either. @nrnhines do you think my guesses are reasonable? If yes, I'll try to find out how to fix the code generation.
I think you're correct about the semantics of _p
because through NEURON 8.2.2, you'd get e.g.
#define on_init *_ppvar[2]._pval
#define _p_on_init _ppvar[2]._pval
where they're literally the same except one is dereferenced. I don't know why you'd ever have a void*
stored in a data handle, I thought part of the purpose of the changes was to get rid of the use of void*
.
In my code when I ran into this the other day, I simply changed my VERBATIM
block's mention of _p_on_init
to &on_init
, which I honestly like better anyways and works with both 8.22 and 9.0a.
(I'd argue for getting rid of the _p
variables entirely except that would probably break legacy VERBATIM blocks... although maybe they're already broken.)
a
void*
stored in a data handle
There's a subtlety here. The first is that data_handle
and generic_data_handle
are different objects. What we have here is a generic_data_handle
. The important difference is that one of them is a template class:
https://github.com/neuronsimulator/nrn/blob/21f16544c850132340d485d4cd7a581f5539d53a/src/neuron/container/data_handle.hpp#L60-L61
and the other isn't: https://github.com/neuronsimulator/nrn/blob/21f16544c850132340d485d4cd7a581f5539d53a/src/neuron/container/generic_data_handle.hpp#L36-L38
This is what I mean when I say that generic_data_handle
is "type-erased". It doesn't know (at least not at compile time) the type of what it's a "handle" of. This is why we need to help it remember by typing:
generic_data_handle c_gdh = ...;
double c = *c_gdh.get<double*>();
You can peak inside and see the void *
that enables the type-erasure here:
https://github.com/neuronsimulator/nrn/blob/21f16544c850132340d485d4cd7a581f5539d53a/src/neuron/container/generic_data_handle.hpp#L294
Yes, that's a pointer to a soa<T>
container, and not a row within that soa
container. It it were just the address of a specific row in the soa
container, then we'd be very close to the C analog of (double*) (m_container)
.
Edit: The above is only true if generic_data_handle
doesn't store a "literal value". Which is the additional complexity of generic_data_handle
: it acts much like a variant. However, I don't think we're dealing with a "literal value" here (and the program is failing because it claims we're not dealing with one). What we have is just the equivalent of a type-erased (modern) data_handle
, because we're referring to a row in some soa
container.
do you think my guesses are reasonable?
Yes.
But I think the SoA implementation of POINTER (like the previous implementations from inception of the POINTER keyword onward) isn't providing permutation invariance, reallocation invariance, and invalidation error messaging (except for itself! Not what it is pointing at.) that we now expect from data handles (at least for data handles to RANGE variables). Sounds like what is wanted is "POINTER is a data handle to a data handle". NMODL syntax, e.g.,
POINTER pre
...
i = g*(vpre - v)
more or less was orginially designed to point to a range variable. In this case the user restricts them to point to a voltage at another location. Though it has an obvious use also for HOC scalar, or ultimately, any double, e.g. A Vector element.
Sadly, in VERBATIM blocks, POINTER was at hand to serve for pointer to Vector or Random123, or, indeed, any HOC Object or C struct. And that is why void was used in the old c translated mod files (where the user would cast at liberty in the VERBATIM code). And COREPOINTER got introduced to transfer struct data (again with user written VERBATIM block code) to and from CoreNEURON.
Going back to "POINTER is a data handle to a data handle". That is in tension with the desire for optimum performance. Perhaps after an explicit low performance implementation, it could (if worthwhile) be improved to have less indirection while frozen and get updated if any data handle data is moved, invalidated, permuted.
Actually, give my current level of understanding of data handle, it is not clear to me that POINTER cannot be considered as
"POINTER is a data handle" and many instances of the mechanism mean it is just an extra std:vector
Sadly, in VERBATIM blocks, POINTER was at hand to serve for pointer to Vector or Random123, or, indeed, any HOC Object* or C struct
Or in my example, a Python callback... i.e. it need not be data per se, we can misuse it to pass around functions and just do the casting in a VERBATIM block. In the future, there should probably be a dedicated POINTER type for this, but that's beyond the scope of this issue.
I'm a lot more confused now, because it seems like POINTER
hasn't been migrated to the soa
container for mechanisms. Three reasons for this statement:
Other mechanism data, such as that stored in pdata for POINTER variables, has not been modified.
neuron::container::Mechanism::storage
can only store FloatingPoint
: https://github.com/neuronsimulator/nrn/blob/efc28fef161020812f27a76f2f778fc228108f74/src/neuron/container/mechanism_data.hpp#L14 register_data_fields
we find: https://github.com/neuronsimulator/nrn/blob/efc28fef161020812f27a76f2f778fc228108f74/src/nrnoc/init.cpp#L782-L786
which run on only the param_info
not the dparam_info
. I can't find a add_mechanism
for dparam_info
, i.e., the POINTER
variables of the mechanism.I looked at the translated dcdt.mod that began this issue and see that the last arg of
_nrn_mechanism_register_data_fields(_mechtype,
most of which are for RANGE variables, is
_nrn_mechanism_field<double*>{"c", "pointer"} /* 0 */);
So I surmise that all of the special types that used to be in the dparam as well as the standard RANGE doubles that used to be in param, are now organized in one place. But from the <legacy?> threadargs definitions, the non-doubles are still separated into the Datum* _ppval
.
As an aside, I notice that the hh.mod registration of ion variables follows the similar pattern. E.g.
_nrn_mechanism_field<double*>{"_ion_ek", "k_ion"} /* 3 */,
It seems to me now that the typical usage of POINTER, I.e, a double to a RANGE variable, is presently fully functional except for an error message if invalid. ```_ppvar[0].get<double>()is certainly a huge advantage over the pre SoA POINTER in being invariant under moving and reallocating what it points to. This issue could be closed with merely a version of
getthat raises an error if it is invalid. E.g.
_ppvar[0].get_if_valid<double*>()```.
Then the explicit check I added in the mod file would not be needed.
Later on, for higher performance, the check could be avoided by a check for validity during initialize.
I believe that one of the ways CoreNEURON has higher performance is that all model data is in a single array and thus a pointer to any range variable outside of the mechanism instance is merely an integer. In particular this includes Node* variables (such as voltage) and ionic variables that are used by the mechanism instance. For the future, it seems this could be effectively introduced into NEURON but I first have to look into how this is currently handled for voltage.
(Messages are crossing; but unfortunately I think the situation might be slightly more complicated. Hence I'm still posting despite the issue being "resolved".)
I think I understand why the generated code uses literal_value
. Unfortunately, it's important to understand the hybrid nature of generic_data_handle
. The simple version is that it can either store a raw pointer; or a data_handle
. If it stores a raw pointer, the raw pointer is simply stored in the generic_data_handle
. If it represents a type-erased data_handle
then it stores enough information to create a genuine data_handle<T>
on the fly.
When creating a generic_data_handle
from a pointer, it'll perform a search to see if this address at that time refers to any element in any (known) soa
container. If that's the case, then the generic_data_handle
will create a modern data_handle
and store all required information to recreate that data handle when needed. This includes storing the "stable identifier" of the element that the pointer points to.
Example, if a mechanism has a c
pointer and in HOC we set c
to point to the voltage of the first segment, then when performing:
_ppval[0] = &v_first_segment;
one will find that the address &v_first_segment
lives inside a soa
container. It'll obtain a (non-owning) stable identifier of v_first_segment
, and store it along with some additional information in _ppval[0]
. If at a later point we ask for c
, then it'll create the data handle and virtue of the stable identifier, find the correct value of c
(even if the soa
has been shuffled or reallocated).
So far it behaves exactly like we initially guessed. Interestingly, under these conditions the POINTER
is stable under permutations. I've added a little test that "proves" as much, see #2466.
However, there's a twist. Namely code that assigns to _p_c
in a verbatim block. We'll use the address of printf
just because I hope this might be a reasonably proxy for what @ramcdougal does with python callbacks:
VERBATIM
_p_c = &printf;
ENDVERBATIM
VERBATIM
_p_c("foo%d", 42);
ENDVERBATIM
If we let _p_c == _ppval[0].get<double*>()
then
_p_c = &printf;
expands to
_ppval[0].get<double*>() = &printf;
Since get
returns by value, i.e. it's signature is double * generic_data_handle::get<double*>();
we'd be assigning &printf
to an rvalue. This fails to compile.
This is where literal_value
comes in. From the documentation:
To match the flexibility of the old Datum, this had to be augmented to allow small literal values (
int
,double
,void*
, …) to be stored inside it and accessed via eitherget<T>()
(if only the value is required) orliteral_value<T>()
if, even worse, the address of the wrapped value needs to be taken.
One imporant difference is that literal_value
returns a non-const reference to the value that's stored in the generic_data_handel
as a 'literal value'. However, that address is meaningless if the generic_data_handle
represents a modern data_handle
. Essentially, if _ppval[0]
refers to a raw pointer and we let _p_c = _ppval[0].literal_value<void*>()
then
_ppval[0].literal_value<void*>() = &printf;
correctly stores the address of printf
inside _ppval[0]
. However, then encounter the issue first reported here.
I have not fully parsed if there are any outstanding questions here, but I thought I would chime in with a few thoughts/comments/observations:
POINTER
variables are used in two distinct ways. With reference to the generated code:
#define c *_ppvar[0].get<double*>()
#define _p_c _ppvar[0].literal_value<void*>()
c
can be used to access the pointed-to value. Attempting to use &c
is formally undefined behaviour if the stored pointer is null, I believe (see https://stackoverflow.com/questions/51691273/is-null-well-defined-in-c), so using this to test validity is questionable. Taking + storing a non-null address that has been obtained in this way is very likely to be a bug, see for example https://github.com/neuronsimulator/nrn/pull/2027#issuecomment-1332306535 and https://nrn.readthedocs.io/en/latest/dev/data-structures.html#storing-addresses-of-range-variables). I would have to check if it is possible to store a raw pointer this way -- I suspect that it is, and that one can (from Python) bind a POINTER
variable to an element of a HOC/Python vector.VERBATIM
blocks. In this case _p_c = ...
is used. Unfortunately the pattern &_p_c
also appears, which is the origin of the use of literal_value
(which returns a reference) instead of get
. An example is https://github.com/neuronsimulator/nrn/blob/efc28fef161020812f27a76f2f778fc228108f74/share/examples/nrniv/netcon/vecevent.mod#L111, but I believe this is also widespread in ModelDB. This will need to be seriously considered when the storage behind POINTER
variables is transposed to SoA format, to ensure that &_p_c
values are not stored for a long enough that they can be invalidated by reallocation or permutation.As has I think been noted, if (as in the first case) the POINTER
variable is referring to something (like a voltage) in the model in a permutation-stable way then c
is the pointed-to voltage value, and &c
is a non-null pointer (double*
), but that pointer is generally calculated on the fly -- it is not stored anywhere in the model data. So, for this case then _p_c
and &c
could be made to yield the same pointer value, but &_p_c
would be invalid -- thus breaking the other use-case.
It's unfortunate that POINTER
has this overloaded meaning. In #2027 I concluded that, while the two cases are both common, mixing usage of c
and _p_c
for the same POINTER
variable c
in the same MOD file did not happen. I believe this is true in the models covered by the test suite and nrn-modeldb-ci
, but it seems from the above that counterexamples exist. In my view, the second use-case (via _p_c
) should be handled differently, essentially as a RANGE
variable of non-double
type -- loosely void*
, but ideally this could be done in a more type-safe manner.
...
RANGE my_ptr [MyStruct*]
...
my_ptr = new MyStruct;
...
*my_ptr; // MyStruct&
In any case, literal_value
is a transition measure that should be consigned to the history bin as soon as possible. I wrote about this in https://nrn.readthedocs.io/en/latest/dev/data-structures.html#eliminate-the-need-for-literal-value.
At the risk of adding to the debt, I don't think there would be any technical barrier to adding _p_valid_c
, yielding a bool
saying whether c
points to a valid part of the SoA data, or if it wraps a non-null raw pointer (thus covering both of the cases above).
mixing usage of c and _p_c for the same POINTER variable c in the same MOD file did not happen.
That kind of mixing would definitely be a bug. My use of if (_p_c) {
was just a naive attempt to generate a recoverable error if I fail to set the data handle from the interpreter instead of getting a segfault.
It's unfortunate that POINTER has this overloaded meaning.
That is certainly true. The nmodl translator correctly disambiguates, I believe, just by noting if c
is used in an expression or assignment statement. If so it must be a datahandle to a double. The only way it can be used to point to an Object is within VERBATIM blocks where the _p_c
syntax is used to set or get a raw pointer.
It seems that #define c *_ppvar[0].get<double*>()
requires that either get
is changed to raise an error if the data handle is invalid, or that the translator is changed to use a method name that does raise an error (and pay the cost of the validity check).
Nevertheless, I do like the idea of new keywords/syntax that avoids the overloading.
The nmodl translator correctly disambiguates, I believe, just by noting if c is used in an expression or assignment statement. If so it must be a datahandle to a double.
OK, I had assumed that this sort of introspection was beyond nocmodl
's capabilities, and that it would need the modern NMODL translator.
I would be inclined to modify the code generation to take advantage of this knowledge, and to generate, effectively, a void*
-typed RANGE variable if _p_X
is being used.
It seems that #define c _ppvar[0].get<double>() requires that either get is changed to raise an error if the data handle is invalid, or that the translator is changed to use a method name that does raise an error (and pay the cost of the validity check).
In the short term, I suppose that this is the simplest way forward.
(apologies, I hit the wrong button and posted prematurely)
This may be related to #2457 but I think it is different. Works for 8.2.2 (and earlier) but not master To reproduce:
fails with