google / zetasql

ZetaSQL - Analyzer Framework for SQL
Apache License 2.0
2.28k stars 214 forks source link

Issue in SimpleCatalog when accessing in a concurrent environment #138

Open masterlittle opened 1 year ago

masterlittle commented 1 year ago

We are using zetasql Java parser as an API on Cloud run. Each container receives multiple requests and the Catalog is declared in the global scope as a static final variable. Each request has a query and we scan the query for tables and add it to the catalog at runtime. The idea is that as more queries come, the catalog will grow as needed, without it needing to be prefilled.

Now the issue is that I'm seeing random Table not found errors when requests are coming. Some how the parser is not seeing the tables in the catalog as it parses the query. This is only happening in concurrent environment. If I keep concurrency as 1, everything works perfectly.

My hypothesis is that the SimpleCatalog uses Hashmap to store tables and functions. As a hashmap is not thread safe, when multiple requests are coming quickly, it is facing issues/not being updated.

Can someone help in finding why this might be happening?

matthewcbrown commented 1 year ago

SimpleCatalog is not thread safe (HashMap is one problem, but there are others), it's really intended to be constructed once per query (or, once per prepared query).

masterlittle commented 1 year ago

Got it. Any suggestions on how to write one for a thread safe environment? Or will it be too complex? A simple fix I could do is using ConcurrentHashmap