Open dqaudithillstone opened 4 years ago
@dqaudithillstone Part of the performance issue with this package is that it uses SqLite as a backend, which means that you're essentially not using one of the main benefits of pandas, which is in memory computation. As a solution to this problem, a few months back I created a package called dataframe_sql, which solves this problem by parsing the sql and translating it to native pandas operations. Hope this helps!
My dataframe records more than 120,000. When I use sqldf to query, if I set limit 20000, the program does not error. but If you remove limit 20000, the program fails. The error message is as follows.
the df records is more than 120,000.
source code: df=pd.read_csv('./dataAnalyse/ht.csv',encoding='utf_8_sig') #支持中文路径 q=""" select a.[合同编号] as htbh1,b.[合同编号] as htbh2,a.[合同名称] as htmc1,b.[合同名称] as htmc2 from df a left join df b on a.年度=b.年度 and a.公司名称=b.公司名称 and a.[合同编号]<>b.[合同编号] limit 20000; """ df0=pysqldf(q) print(df0.info())
The error message is as follows:
PS D:\python\myworks> & C:/python/Python37-32/python.exe d:/python/myworks/DataAnalyse/xsd.py Traceback (most recent call last): File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 3206, in fetchall l = self.process_rows(self._fetchall_impl()) File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 3173, in _fetchall_impl return self.cursor.fetchall() sqlite3.OperationalError
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "C:\python\Python37-32\lib\site-packages\pandasql\sqldf.py", line 61, in call result = read_sql(query, conn) File "C:\python\Python37-32\lib\site-packages\pandas\io\sql.py", line 438, in read_sql chunksize=chunksize, File "C:\python\Python37-32\lib\site-packages\pandas\io\sql.py", line 1231, in read_query data = result.fetchall() File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 3212, in fetchall self.cursor, self.context) File "C:\python\Python37-32\lib\site-packages\sqlalchemy\engine\base.py", line 1843, in _handle_dbapi_exception from e sqlalchemy.exc.OperationalError: (OperationalError) None None
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "d:/python/myworks/DataAnalyse/xsd.py", line 20, in
df0=pysqldf(q)
File "d:/python/myworks/DataAnalyse/xsd.py", line 7, in pysqldf
return sqldf(q,globals())
File "C:\python\Python37-32\lib\site-packages\pandasql\sqldf.py", line 156, in sqldf
return PandaSQL(db_uri)(query, env)
File "C:\python\Python37-32\lib\site-packages\pandasql\sqldf.py", line 63, in call
raise PandaSQLException(ex)
pandasql.sqldf.PandaSQLException: (OperationalError) None None