jgerstmayr / EXUDYN

Multibody Dynamics Simulation: Rigid and flexible multibody systems
https://www.youtube.com/channel/UCsQD2bIPBXB_4J23WtqKkVw/playlists
Other
166 stars 23 forks source link

On the usage of mutables as default parameters #44

Closed ManuelZ closed 1 year ago

ManuelZ commented 1 year ago

I've noticed that in several functions, a list is being used as a default parameter. This may be problematic because in subsequent calls to such functions, the original list —the one created during the first function call— is what's available inside the function, and not a new list, as one would expect. This is not exclusively related to lists, but to to mutables, as a dictionary for example.

I don't know if this is currently causing any issue, but it could, as it's a common problem.

Examples of where is this happening:

https://github.com/jgerstmayr/EXUDYN/blob/bc412ed2147d5274a36cfd71661944753850bf87/main/pythonDev/exudyn/solver.py#L354

https://github.com/jgerstmayr/EXUDYN/blob/bc412ed2147d5274a36cfd71661944753850bf87/main/pythonDev/exudyn/kinematicTree.py#L505

More info: https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments

jgerstmayr commented 1 year ago

Thanks for the hint. I raised an issue for that. There are around 30 such list initialization, which need to be checked if the args are modified internally and checked whether it makes sense to change them to None. This would then mean that users do not intuitively see how to initialize these args.

ManuelZ commented 1 year ago

The arguments being modified internally isn't the only occasion when problems can arise. A generic example, this time with this happening inside a method:

class A:
    def __init__(self, x=[0, 0, 0]):
        self.x = x

a1 = A()
print(f"a1.x started with: {a1.x}")

a1.x.append(2)
print(f"a1.x was modified to : {a1.x}")

a2 = A()
print(f"a2.x started with: {a2.x}")

Prints:

a1.x started with: [0, 0, 0]
a1.x was modified to : [0, 0, 0, 2]
a2.x started with: [0, 0, 0, 2]

A concrete example of something like that happening in Exudyn would be if somebody tries to modify attributes by hand:

from exudyn.itemInterface import NodePoint

node1 = NodePoint()
print(node1.referenceCoordinates)

node1.referenceCoordinates[2] = 5
print(node1.referenceCoordinates)

node2 = NodePoint()
print(node2.referenceCoordinates)

Prints:

node1 started with reference coordinates:  [0.0, 0.0, 0.0]
node1 reference coordinates were modified to:  [0.0, 0.0, 5]
node2 started with reference coordinates:  [0.0, 0.0, 5]

A solution could look like:

class A:
    def __init__(self, x=None):
        self.x = x or [0, 0, 0]

a1 = A()
print(f"a1.x started with: {a1.x}")

a1.x.append(2)
print(f"a1.x was modified to : {a1.x}")

a2 = A()
print(f"a2.x started with: {a2.x}")

a3 = A([0, 0, 0])
print(f"a3.x started with: {a3.x}")

a1.x.append(9)
a2.x.append(9)
print(f"After modifying a1 and a2, a3.x is: {a3.x}")

Which prints:

a1.x started with: [0, 0, 0]
a1.x was modified to : [0, 0, 0, 2]
a2.x started with: [0, 0, 0]
a3.x started with: [0, 0, 0]
After modifying a1.x and a2.x, a3.x is: [0, 0, 0]
jgerstmayr commented 1 year ago

These cases just reflect the basic problems of Python: the undefined behavior of special classes regarding copy and referencing. We need the default parameters in the itemInterface, as it alleviates the transfer of data to the items with the autocompletion in VSCode or Spyder. Otherwise, the itemInterface would not be needed and the pure dictionaries of e.g. AddObject(...) could be used as well.

I see an alternative by copying the lists in the constructor. This is roughly 3x slower. As they are flat, a pure copy.copy(...) should be sufficient. This could be done automatically when creating itemInterface.py.

ManuelZ commented 1 year ago

The Python Language Reference should have all the needed information, although in a very dry state. The most clear explanations regarding copy and referencing that I've found are in places like this, particularly around the point "Fact: Python passes function arguments by assigning to them" and this Stackoverflow answer.

Another option would be to start using type annotations, either embedded in the code:

class NodeRigidBody2D:
    def __init__(self,
        name: str = '',
        referenceCoordinates: list = None,
        initialCoordinates: list = None,
        initialVelocities: list = None,
        visualization: dict = None):
    ...

or in .pyi files, as Numpy does.

Scikit-Learn doesn't yet support this, as it creates complications that in the short term may not be worth it, due to the lack of man-power (but that may be worth it in the long-term, specially when a type-checker leverages them). The IDE suggestion wouldn't be a [0, 0, 0] list, for example, but plainly, a list.

jgerstmayr commented 1 year ago

Let's put it simpler: how can you know, if values are copied or referenced in exudyn? Well, it is only defined by the CPP code, but you cannot see this from the Python function. Therefore, modifying any mutable arg inside a function is always a problem, no matter if this is a default arg or not. As mentioned, I created issues (1536,1540) to test copying list arguments to surpass this problem mainly in the itemInterface. In performance-critical cases, this may be problematic, but otherwise this will be done. NOTE: in your example, referenceCoordinates would become None and I am not attempting to implicitly convert None with pybind11 to a list in the cpp interface. Furthermore, it totally hides the default values. Default values are very helpful, also ChatGPT likes them a lot in order to better understand writing Exudyn code!

jgerstmayr commented 1 year ago

Starting with version 1.6.173 (soon on github), all adaptations for mutable default arguments have been finished. Some structures have been replaced by None; However, using the automatic interface setup, it is ensured that lists, vectors, matrices, etc. are always copied in itemInterface; performance is not affected here, even if we initialize 100.000 items; in Python functions, lists are never modified inside a function, except that it is necessary and explicitly mentioned (but very rare). In general, this should make pitfalls of default arguments nearly impossible.

A slightly extended test for the above mentioned problem (which is not the recommended way to be done in Exudyn!) reads:

import exudyn
from exudyn.itemInterface import NodePoint
print(exudyn.GetVersionString())

node1 = NodePoint()
print(node1.referenceCoordinates)
print(node1.visualization)

node1.referenceCoordinates[2] = 5
node1.visualization['drawSize'] = 13
node1.visualization['color'][1] = 42
print(node1.referenceCoordinates)
print(node1.visualization)

node2 = NodePoint()
print(node2.referenceCoordinates)
print(node2.visualization)

Exudyn version 1.6.0 gave erroneously:

1.6.0
[0.0, 0.0, 0.0]
{'show': True, 'drawSize': -1.0, 'color': [-1.0, -1.0, -1.0, -1.0]}
[0.0, 0.0, 5]
{'show': True, 'drawSize': 13, 'color': [-1.0, 42, -1.0, -1.0]}
[0.0, 0.0, 5]
{'show': True, 'drawSize': 13, 'color': [-1.0, 42, -1.0, -1.0]}

but since 1.6.173 this is corrected:

1.6.173.dev1
[0. 0. 0.]
{'show': True, 'drawSize': -1.0, 'color': [-1.0, -1.0, -1.0, -1.0]}
[0. 0. 5.]
{'show': True, 'drawSize': 13, 'color': [-1.0, 42, -1.0, -1.0]}
[0. 0. 0.]
{'show': True, 'drawSize': -1.0, 'color': [-1.0, -1.0, -1.0, -1.0]}

This problem should be resolved for now, but I am grateful about further feedback!