Closed Gilnore closed 16 hours ago
Hi @Gilnore , sorry for the crash of your machine. This is weird to me. We haven't encountered any cases where the running/crash of Kuzu leads to changed settings of monitors, or restart, or crash of GPU. We can investigate into this a bit more to see what went wrong. Could u kindly provide a schema of the data frame? (if it's possible to dump the data frame to file and ship to us would be helpful) We can try to populate a large data frame to that size to reproduce the issue. Also, if you can provide the hardware configuration of your machine would be great.
@benjaminwinger can help look into this issue when he got time.
Hi @Gilnore , sorry for the crash of your machine. This is weird to me. We haven't encountered any cases where the running/crash of Kuzu leads to changed settings of monitors, or restart, or crash of GPU. We can investigate into this a bit more to see what went wrong. Could u kindly provide a schema of the data frame? (if it's possible to dump the data frame to file and ship to us would be helpful) We can try to populate a large data frame to that size to reproduce the issue. Also, if you can provide the hardware configuration of your machine would be great.
@benjaminwinger can help look into this issue when he got time.
Thanks for the reply. I loaded an entire SQL table with pandas read sql table. The table was in Postgres. I can provide the sqlalchemy code that generated the table. (I’ll need to quickly switch devices first, got to reboot the PC, I’ll attach the code in a separate comment).
here's the code that generated the sql table:
@architect.install('partition',
type='range',
subtype='date',
constraint='100000',
column='date_time',
db = postgres_url)
class history(postgres_base):
__tablename__='history'
identity: Mapped[stock_id] = composite(mapped_column('date_time',
sa.TIMESTAMP,
index=True),
mapped_column('symbol',
sa.String,
index=True))
Open:Mapped[float] = mapped_column('Open')
Close:Mapped[float] = mapped_column('Close')
High:Mapped[float] = mapped_column('High')
Low:Mapped[float] = mapped_column('Low')
Volume:Mapped[float|None] = mapped_column('Volume',nullable=True)
Dividends:Mapped[float|None] = mapped_column('Dividends',nullable=True)
Splits:Mapped[float|None] = mapped_column('Splits',nullable=True)
__table_args__ = (sa.PrimaryKeyConstraint('date_time','symbol',name='identity'),
# {'postgresql_partition_by': 'RANGE (date_time)'}
)
here's the code that generates the graph:
def create_node_tables(self, table, kuzu_conn):
#make nodes of various types by putting together some tables in sql
columns = []
table_lable = table.__tablename__
#we need to add primary key statments
#we can try to decypher the identity arg of table
primaries = [key.name for key in table.__mapper__.primary_key]
typed_primaries = []
# Loop through columns in the table
for col in table.__table__.columns:
col_name = col.name
col_type = col.type
# Map SQLAlchemy types to Kuzu types (expand as needed)
kuzu_type = self.map_sa_type(col_type)
if col_name in primaries:
typed_primaries.append(f"{col_name} {kuzu_type}")
columns.append(f"{col_name} {kuzu_type}")
else:
columns.append(f"{col_name} {kuzu_type}")
#we need to pack the primary key columns into a struct
#if the primary key is a composite key, ie if len(primaries) > 1
if len(typed_primaries)>1:
# Join columns into the CREATE NODE statement
columns_str = ", \n".join(columns)
#then in the statment a composite key refers to a struct
# primaries_str = ', '.join(typed_primaries)
#unfortunatly the id must be string or numerical, so we're using
#a seregant key
create_node_statement = f"""CREATE NODE TABLE IF NOT EXISTS
{table_lable}
(id SERIAL,
{columns_str},
PRIMARY KEY (id)
);"""
else:
# Join columns into the CREATE NODE statement
columns_str = ", \n".join(columns)
#
create_node_statement = f"""CREATE NODE TABLE IF NOT EXISTS
{table_lable}
({columns_str},
PRIMARY KEY ({primaries[0]})
);"""
kuzu_conn.execute(create_node_statement)
def create_edge_tables(self,source, target, kuzu_conn):
#source and target need to be specified if table is not
#the only table on edge is DOT
a = source.__tablename__
b = target.__tablename__
create_edge_statement = f"""
CREATE REL TABLE IF NOT EXISTS {a}_effect_on_{b} (FROM {a}
TO {b},
source_time TIMESTAMP,
target_time TIMESTAMP,
correlation_index DOUBLE,
exchange_index DOUBLE);"""
kuzu_conn.execute(create_edge_statement)
def read_to_node(self,table, kuzu_conn, chunksize = 0, sample = False):
#pass in a sql table object and a kuzu connection
table_name = table.__tablename__
if table_name == 'history':
#problem, the table is not detected this way.
con = self.history_session.connection()
else:
con = self.session.connection()
if chunksize > 0:
df = pd.read_sql_table(table_name, con,chunksize=chunksize)
#df should now be an iterator each returning 50000 rows
columns = kuzu_conn._get_node_property_names(table_name)
columns = list(columns.keys())
if 'id' in columns:
columns.remove('id')
for data in df:
#reordering dataframe
data = data[columns]
try:
kuzu_conn.execute(f"COPY {table_name} FROM data")
except Exception as e:
print(f'\n\n\n error when dealing with {table_name}')
raise e
if sample:
break
else:
df = pd.read_sql_table(table_name, con)
columns = kuzu_conn._get_node_property_names(table_name)
columns = list(columns.keys())
if 'id' in columns:
columns.remove('id')
df = df[columns]
try:
kuzu_conn.execute(f"COPY {table_name} FROM df")
except Exception as e:
print(f'\n\n\n error when dealing with {table_name}')
raise e
print(f'{table_name} nodes allocated.')
also it seems kuzu.Database('econ_graph') encounters a unknown error and crashes, even if I were to exclude the problematic table, but everything works perfectly fine if i were to reduce the number of nodes to 100 using the sampling function of the read_to_node method.
most of this code is modified from chat-gpt, so there might be some bad notations and habits like using f strings and underscored, i don't know if that had to do with anything.
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2.59 GHz with integrated UHD graphics 630 RAM 32.0 GB (31.8 GB 可用) Nvidia GTX 1660Ti GPU
my PC is in Chinese, I don't know if the language of OS have anything to do with it, but i can't change the language settings on Windows 11 home.
I should also mention when using chunk size of 50,000 , the PC crashes after running for 48 hours. The COPY operation would probably take longer. It is also somehow eating space on my C drive, when kuzu is installed completely on a microconda virtual environment in E drive. The PostgreSQL database was in D drive. Docker is in C for running kuzu-explorer, but it wasn't running.
I went and ran a SELECT COUNT (*) for the problematic table, there are 132123860 rows.
I tried again after clearing out some space (40Gb) in C drive in case there’s some type of memory leak or something that causes all the space to fill up.
Problem persists, first task manager crashes, then 10Gb of C drive memory is instantly occupied, then a mosaic coloured square appears in the screen centre, then the computer becomes locked and is unable to be unlocked.
I restarted the computer at that point trying to prevent another disconnection of GPU.
That 10 Gb remains occupied after restart, but only 5 Gb of temporary files was detected. I deleted temporary files prior to the test. I’m going to guess this is just some safety measure thing from windows for now, since sniffing through the files again didn’t give any changes.
Edit: I just realized that I listed all my cpu count in the kuzu connection, it might be that. I have since reduced that by half. Testing again.
Problem persists That’s what I managed to get as it crashed, screen did go black, but GPU didn’t disconnect this time.
The translation is Unknown software crash.
there’s also one with explorer.exe crash
The DWM crash was after everything automatically closed down.
no apparent change to disk memory this time.
Further reduction of cpu count to 1/4 results only in crash of IDE and program. I’ll now try no multithreading.
Never mind, it seems reducing cores didn’t help. What did was not opening task manager. It seems that as long as you have task manager open when the program runs, the crash is always bad enough to cause black screen.
Edit: trying again with just Spyder (IDE) open without multithreading, still crashed hard. It’s safe to say that the crash isn’t related to something like that.
The following is the only thing not black on screen:
Mouse controls are frozen, ctrl alt delete didn’t do anything.
Edit: this showed up, still no mouse control:
Then this: Translated to if nothing happens after Ctrl Alt Delete, press Esc to restart.
DWM crash message happened after restart.
Hi @Gilnore, as an alternative to using Pandas read_sql
which is entirely in memory, have you considered using Kùzu's Postgres extension to directly scan the contents of a Postgres table and copy it into Kùzu?
https://docs.kuzudb.com/extensions/attach/rdbms/
You can copy subsets of your Postgres table into Kùzu by using a LOAD FROM
subquery, as shown in the docs:
COPY Person FROM (LOAD FROM pg_table.person RETURN name);
Looking at your code above, all the operations to move data between your Postgres source and Kùzu is done from within Python, which explains the high memory usage that leads to the crash.
We haven't done extensive benchmarking on scanning performance from Postgres tables, but it should be better than relying on Pandas for that large a dataset, not to mention that it should require far less glue code and Python loops. Curious to know if this approach solves your crash and memory issues.
Hi @Gilnore, as an alternative to using Pandas
read_sql
which is entirely in memory, have you considered using Kùzu's Postgres extension to directly scan the contents of a Postgres table and copy it into Kùzu? https://docs.kuzudb.com/extensions/attach/rdbms/You can copy subsets of your Postgres table into Kùzu by using a
LOAD FROM
subquery, as shown in the docs:COPY Person FROM (LOAD FROM pg_table.person RETURN name);
Looking at your code above, all the operations to move data between your Postgres source and Kùzu is done from within Python, which explains the high memory usage that leads to the crash.
We haven't done extensive benchmarking on scanning performance from Postgres tables, but it should be better than relying on Pandas for that large a dataset, not to mention that it should require far less glue code and Python loops. Curious to know if this approach solves your crash and memory issues.
Oh that’s cool, I didn’t know, let me try that real quick. (Chat GPT really let me down XD)
I keep running into "postgres_scanner.duckdb_extension not found, Extension "postgres" is an existing extension." when running the attach statement for Postgres, and Unicode error when running install statements for SQLite.
the same error occurs for Postgres if I didn't run the install statement for it, but with another line like the following:
"postgres_scanner.duckdb_extension" not found. Extension "postgres" is an existing extension.
Install it first using "INSTALL postgres".
here's the function:
def read_to_node(self,table, kuzu_conn,
skip_unsupported_table = False,
install_ext = False,
load_ext = False):
#pass in a sql table object and a kuzu connection
table_name = table.__tablename__
install_stmt = None
if table_name == 'history':
#we'll load the plugin for postgresql
install_stmt = """
INSTALL postgres;
"""
load_stmt = """
LOAD EXTENSION postgres;
"""
path = os.getenv('postgres_con')
dbtp = 'postgres'
else:
#we'll load the plugin for sqlite
install_stmt = """
INSTALL sqlite;
"""
load_stmt = """
LOAD EXTENSION sqlite;
"""
path = os.getenv('sqlite_path')
dbtp = 'sqlite'
att_stmt = f"""
ATTACH '{path}' AS Original
(dbtype {dbtp}, skip_unsupported_table = {skip_unsupported_table});
"""
copy_stmt = f"COPY {table_name} FROM Original.{table_name};"
if install_ext:
kuzu_conn.execute(install_stmt)
if load_ext:
kuzu_conn.execute(load_stmt)
kuzu_conn.execute(att_stmt)
kuzu_conn.execute(copy_stmt)
kuzu_conn.execute('CALL clear_attached_db_cache() RETURN *')
kuzu_conn.execute('DETACH Original')
print(f'{table_name} nodes allocated.')
Hi @Gilnore Extensions are currently not working for v0.6.0. To use the postgres extension in kuzu, you can install our latest dev build by:
pip install kuzu==0.6.1.dev58
I think this will be impossible for us to replicate on our side and I also don't think even if this was an issue on Kuzu's side, I cannot think of how we would be crashing other processes (and I doubt we are, the problem can be somewhere else).
The only thing I can think of right now is to not use Kuzu or any other DBMS in a system if a large chunk of RAM is already occupied by other processes. So if 21GB is already used by pandas, Kuzu won't be able to get enough basic memory to operate (Kuzu's buffer manager by default tries to grab 80% of your RAM). So the setting is not ideal. You can try to set the buffer manager to a smaller size but not to a few GB if your data is tens of GB.
I see, thanks for everything. I'll continue with the extensions.
Kùzu version
V0.6.0
What operating system are you using?
Windows 11
What happened?
I was trying a COPY operation on a pandas data frame.
First my monitor size changed, enlarging everything, then the screen went black.
The data frame was very large about 21 GB in size, filling up 96% of my RAM.
The issue doesn’t go away even if I chunk down the data frame, just delays it.
The PC automatically restarts. The screen aspect is permanently changed.
Edit: further investigation shows a DWM crash followed by the PC no longer detecting the GPU.
Are there known steps to reproduce?
No response