zerohour-phishing-detection / zpd-server

Code and test data for anti-phishing tool: A decision-support tool for experimentation on zero-hour phishing detection
Creative Commons Attribution 4.0 International
2 stars 0 forks source link

SQLite multithreading #18

Open ronalddenouden opened 8 months ago

ronalddenouden commented 8 months ago

Reverse image search is not possible to be multi threading due to SQLite not allowing objects from different threads to be executed currently. We can see if changing the compiling settings of SQLite makes this possible or if there are other ways. Also making an interface for calls to store things might be better then SQLite everywhere.

2024-03-05 15:08:59,163 [C:\Users\20212219\Documents\GitHub\CSRP-zdp\zpd-server\utils\reverse_image_search.py:172] ERROR: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 5500 and this is thread id 3236. Traceback (most recent call last): File "C:\Users\20212219\Documents\GitHub\CSRP-zdp\zpd-server\utils\reverse_image_search.py", line 163, in _search_image_all await self._rev_image_search(poi, search_engine, sha_hash) File "C:\Users\20212219\Documents\GitHub\CSRP-zdp\zpd-server\utils\reverse_image_search.py", line 296, in _rev_image_search await asyncio.gather(awaits) File "C:\Users\20212219\AppData\Local\Programs\Python\Python311\Lib\asyncio\tasks.py", line 349, in __wakeup future.result() File "C:\Users\20212219\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\thread.py", line 58, in run result = self.fn(self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\20212219\Documents\GitHub\CSRP-zdp\zpd-server\utils\reverse_image_search.py", line 294, in awaits.append(loop.run_in_executor(pool, lambda: self.region_image_search(search_engine, sha_hash, region_data, topx))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\20212219\Documents\GitHub\CSRP-zdp\zpd-server\utils\reverse_image_search.py", line 322, in region_image_search self.conn_storage.execute( sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 5500 and this is thread id 3236. []

TPGamesNL commented 8 months ago

only part of sql usage remaining is in sessions.py, in which the session storage could be reworked to store in-memory instead. or at least we should look into what is normal for cache like this to have.

or we keep it on sqlite for now, and use https://stackoverflow.com/a/2894830 and mutex to prevent concurrency issues ourselves