lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.5k stars 158 forks source link

Array in dataclass apparently share storage in CPython but not in LPython #2104

Open rebcabin opened 1 year ago

rebcabin commented 1 year ago
 lpython import (i8, i32, i64, f32, f64,
                     u8, u32,
                     TypeVar, Const,
                     dataclass
                     )

from numpy import (empty, sqrt, float32, float64,
                   int8, int32, array, # ndarray
                   )

@dataclass
class LpBhvSmall:
    dim : i32 = 4
    a : i8[4] = empty(4, dtype=int8)

def g() -> None:
    l1 : LpBhvSmall = LpBhvSmall()
    l1.a[0] = i8(-96)
    l1.a[1] = i8(-17)
    l1.a[2] = i8(80)
    l1.a[3] = i8(107)

    assert l1.a[0] == i8(-96)
    assert l1.a[1] == i8(-17)
    assert l1.a[2] == i8(80)
    assert l1.a[3] == i8(107)

    ################# ATTENTION: OVERWRITES l1.a in CPYTHON
    ################# BUT NOT IN LPYTHON

    l2 : LpBhvSmall = LpBhvSmall()
    l2.a[0] = i8(-42)
    l2.a[1] = i8(-99)
    l2.a[2] = i8(3)
    l2.a[3] = i8(-110)

    assert l2.a[0] == i8(-42)
    assert l2.a[1] == i8(-99)
    assert l2.a[2] == i8(3)
    assert l2.a[3] == i8(-110)

    assert l1.a[0] == i8(-96)
    assert l1.a[1] == i8(-17)
    assert l1.a[2] == i8(80)
    assert l1.a[3] == i8(107)

if __name__ == "__main__":
    g()
(lp) ┌─(~/CLionProjects/lpython/lasr/LP-pycharm)─────────────────────────────────────────────────────────────────────────────────────────────────────────(brian@Golf37:s000)─┐
└─(21:49:32 on brian-lasr ✹ ✭)──> PYTHONPATH='../../src/runtime/lpython' python issue2104.py                                                          1 ↵ ──(Tue,Jul04)─┘
Traceback (most recent call last):
  File "/Users/brian/CLionProjects/lpython/lasr/LP-pycharm/issue2104.py", line 47, in <module>
    g()
  File "/Users/brian/CLionProjects/lpython/lasr/LP-pycharm/issue2104.py", line 40, in g
    assert l1.a[0] == i8(-96)
AssertionError
(lp) ┌─(~/CLionProjects/lpython/lasr/LP-pycharm)─────────────────────────────────────────────────────────────────────────────────────────────────────────(brian@Golf37:s000)─┐
└─(21:49:43 on brian-lasr ✹ ✭)──> ~/CLionProjects/lpython/src/bin/lpython -I. issue2104.py
rebcabin commented 1 year ago

Because of #2102, I don't see a workaround, so this is a blocker, at least for keeping the code working well in CPython as well as in LPython

Thirumalai-Shaktivel commented 1 year ago

I'm confused! Why does it overwrites in CPython? Does it share same memory location?

Smit-create commented 1 year ago

That's a bug on CPython side? I can reproduce this without using lpython.py

from dataclasses import dataclass
import numpy as np

@dataclass
class LpBhvSmall:
    quantity_on_hand: int
    name: str = 'okay'
    x: np.ndarray = np.ones(2)

def g() -> None:
    l1 : LpBhvSmall = LpBhvSmall(1)
    print(l1.name, l1.quantity_on_hand, l1.x)
    l2:  LpBhvSmall = LpBhvSmall(2)
    print(l2.name, l2.quantity_on_hand, l2.x)
    print(l1.name, l1.quantity_on_hand, l1.x)
    l2.name = 'l2'
    l2.x[0] = 22
    l2.x[1] = 22
    print(l2.name, l2.quantity_on_hand, l2.x)
    print(l1.name, l1.quantity_on_hand, l1.x)

if __name__ == "__main__":
    g()
rebcabin commented 1 year ago

yes, the arrays use shared memory in CPython. You can understand more by reading about field(default_factory=list) in the dataclasses documentation for CPython. CPython does not allow field = [] because all instances of the dataclass will share memory of the field! It's a weird design, but that's how CPython does it!

On Tue, Jul 4, 2023 at 10:00 PM Thirumalai Shaktivel < @.***> wrote:

I'm confused! Why does it overwrites in CPython? Does it share same memory location?

— Reply to this email directly, view it on GitHub https://github.com/lcompilers/lpython/issues/2104#issuecomment-1621027494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABSRR3RNUZAGFM3C6Z2CJ3XOTYGZANCNFSM6AAAAAAZ6NEFRQ . You are receiving this because you authored the thread.Message ID: @.***>

rebcabin commented 1 year ago

Removing BLOCKER tag because I found a workaround.

from lpython import (i8, i32, i64, f32, f64,
                     u8, u32,
                     TypeVar, Const,
                     dataclass
                     )

from numpy import (empty, sqrt, float32, float64,
                   int8, int32, array, # ndarray
                   )

@dataclass
class LpBhvSmall:
    dim : i32 = 4
    a : i8[4] = empty(4, dtype=int8)

def g() -> None:
    l1 : LpBhvSmall = LpBhvSmall() # Issue 2102 can't initialize
    # 4, empty(4, dtype=int8))
    l1.a = empty(4, dtype=int8)
    l1.a[0] = i8(-96)
    l1.a[1] = i8(-17)
    l1.a[2] = i8(80)
    l1.a[3] = i8(107)

    assert l1.a[0] == i8(-96)
    assert l1.a[1] == i8(-17)
    assert l1.a[2] == i8(80)
    assert l1.a[3] == i8(107)

    ################# ATTENTION: OVERWRITES l1.a in CPYTHON
    ################# BUT NOT IN LPYTHON

    l2 : LpBhvSmall = LpBhvSmall() # Issue 2102 can't initialize
    # 4, empty(4, dtype=int8))
    l2.a = empty(4, dtype=int8)
    l2.a[0] = i8(-42)
    l2.a[1] = i8(-99)
    l2.a[2] = i8(3)
    l2.a[3] = i8(-110)

    assert l2.a[0] == i8(-42)
    assert l2.a[1] == i8(-99)
    assert l2.a[2] == i8(3)
    assert l2.a[3] == i8(-110)

    assert l1.a[0] == i8(-96)
    assert l1.a[1] == i8(-17)
    assert l1.a[2] == i8(80)
    assert l1.a[3] == i8(107)

if __name__ == "__main__":
    g()
rebcabin commented 1 year ago

yes, the arrays use shared memory in CPython. You can understand more by reading about field(default_factory=list) in the dataclasses documentation for CPython. CPython does not allow field = [] because all instances of the dataclass will share memory of the field! It's a weird design, but that's how CPython does it! On Tue, Jul 4, 2023 at 10:00 PM Thirumalai Shaktivel < @.> wrote: I'm confused! Why does it overwrites in CPython? Does it share same memory location? — Reply to this email directly, view it on GitHub <#2104 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABSRR3RNUZAGFM3C6Z2CJ3XOTYGZANCNFSM6AAAAAAZ6NEFRQ . You are receiving this because you authored the thread.Message ID: @.>

This is BY DESIGN in CPython dataclasses :) Not a bug in CPython, but a MISFEATURE in LPython. LPython should share storage like CPython unless the field(default_factory=lambda : empty(4, dtype=int8)) or something like that is implemented in LPython.

certik commented 1 year ago

We'll have to understand what is going on here and how to design ASR's Struct. It looks like it is behaving like some kind of a pointer to the same numpy array. Very confusing!

We can work around such misfeatures by restricting what you can do, as we have done for regular variables, which also behave like pointers in CPython, but we treat them like non-pointers in LPython, but by restricting what you can do, the two approaches are equivalent on the subset. So we need to figure out something similar here.