Closed mingfang closed 2 years ago
Looks like that error is coming from the engine -- do you have the logs from it too (sgr engine log
or the Docker container logs if you're running without the sgr engine
wrapper)?
I'm running Splitgraph inside Kubernetes using this image: splitgraph/engine:0.2.15-postgis
Here is the log
splitgraph-0:PostgreSQL Database directory appears to contain a database; Skipping initialization
splitgraph-0:2021-08-11 21:05:56.164 GMT [1] LOG: starting PostgreSQL 12.7 (Debian 12.7-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
splitgraph-0:2021-08-11 21:05:56.165 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
splitgraph-0:2021-08-11 21:05:56.165 GMT [1] LOG: listening on IPv6 address "::", port 5432
splitgraph-0:2021-08-11 21:05:56.171 GMT [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
splitgraph-0:2021-08-11 21:05:56.200 GMT [26] LOG: database system was shut down at 2021-08-11 21:05:43 GMT
splitgraph-0:2021-08-11 21:05:56.216 GMT [1] LOG: database system is ready to accept connections
splitgraph-0:2021-08-11 21:07:08.814 GMT [56] ERROR: Error in python: KeyError
splitgraph-0:2021-08-11 21:07:08.814 GMT [56] DETAIL: Traceback (most recent call last):
splitgraph-0:
splitgraph-0: File "/splitgraph/splitgraph/core/fdw_checkout.py", line 58, in __init__
splitgraph-0: self._initialize_engines()
splitgraph-0:
splitgraph-0: File "/splitgraph/splitgraph/core/fdw_checkout.py", line 182, in _initialize_engines
splitgraph-0: use_fdw_params=True,
splitgraph-0:
splitgraph-0: File "/splitgraph/splitgraph/engine/__init__.py", line 665, in get_engine
splitgraph-0: conn_params = cast(Dict[str, Optional[str]], _prepare_engine_config(CONFIG, name))
splitgraph-0:
splitgraph-0: File "/splitgraph/splitgraph/engine/__init__.py", line 57, in _prepare_engine_config
splitgraph-0: config_dict if name == "LOCAL" else get_all_in_section(config_dict, "remotes")[name],
splitgraph-0:
splitgraph-0: KeyError: 'splitgraph'
splitgraph-0:
splitgraph-0:2021-08-11 21:07:08.814 GMT [56] STATEMENT: SET enable_sort=off; SET enable_hashagg=on;CREATE TABLE "splitgraph_meta"."sg_tmp_48a5834071346fb90838d6f08bbb9531" AS SELECT * FROM rdu
If you're running it inside of Kubernetes, did you make sure to bind mount / copy the .sgconfig file into the engine container as well (https://www.splitgraph.com/docs/configuration/introduction#in-engine-configuration)? The logging level error makes me think it's using the default empty value and hasn't found a config file.
I agree with the logging level. I fixed that. But I would recommend to using a default log level instead.
hmm, bind mounting .sgconfig doesn't make any sense. Keep in mind everything in this demo https://www.splitgraph.com/docs/getting-started/decentralized-demo works with my setup.
It's only a problem when I change the splitfile like this FROM demo/weather IMPORT {SELECT * FROM rdu} AS source_data
My Splitgraph client, sgr, has this .sgconfig
[defaults]
SG_ENGINE_PORT=6432
SG_ENGINE_PWD=splitgraph
SG_ENGINE_ADMIN_USER=sgr
SG_ENGINE_ADMIN_PWD=splitgraph
SG_UPDATE_LAST=1628646162
[remote: splitgraph]
SG_ENGINE_ADMIN_USER=splitgraph
SG_ENGINE_ADMIN_PWD=splitgraph
SG_ENGINE_POSTGRES_DB_NAME=splitgraph
SG_ENGINE_HOST=splitgraph.splitgraph
SG_ENGINE_PORT=5432
SG_ENGINE_USER=splitgraph
SG_ENGINE_PWD=splitgraph
SG_ENGINE_DB_NAME=splitgraph
I set my env with this
export SG_ENGINE=splitgraph
Looking at the engine error, it looks like it's trying to read its config and looking for the splitgraph
remote.
Why would the engine need to do that?
And why would it only do that when I modify the demo splitfile?
I was able to get it to work using the LOCAL engine; basically cleared the SG_ENGINE
env.
Does this imply that (some)splitfiles can only work with the LOCAL engine and not remote engines?
I'm guessing the problem is SG_ENGINE on the client side must match SG_ENGINE on the engine side. This is certainly going to be untrue for large deployments. My use case is to have a central Splitgraph instance running inside Kubernetes, and each client (sgr and python) will connect to it as a remote.
You should definitely be able to run Splitfiles against one engine (client) using the data on a different remote (in your case, splitgraph
) engine.
The issue is in configuring them to make sure both the sgr
client and the "client" engine know where to download the objects from (so both sgr
and your client engine need to know how to connect to the remote splitgraph
engine), so that's why we put the same .sgconfig
into both sgr and the engine itself.
I think in the first case, the Splitfile executor (running in the Python sgr
process) just gets enough metadata from the remote engine to move the pointers to make an image with a table that has the same contents in the new image. In the second case, it uses the local engine to create a staging table with the data, so the local engine tries to download the table fragments from the remote splitgraph
engine and fails since it doesn't know how to connect to it.
I created a self contained repo to demonstrate this problem here https://github.com/mingfang/splitfile-remote.git
Looks like you only have one engine in that repository? The configuration in your use case is two engines:
sgr
can't work standalone, it needs a sidecar engine to go with it.@mildbyte Thanks for the explanation. I was trying to get away with using just one engine(the remote engine), because I didn't want members of my data team to have to learn docker. But if a local engine is a requirement then that's way I will go. Thanks again for your help.
I completed this demo https://www.splitgraph.com/docs/getting-started/decentralized-demo and am trying out different splitfiles.
My first attempt is something simple. In an empty splitfile, I added this and works.
FROM demo/weather IMPORT rdu AS source_data
But when I replace that line with something I thought was equivalent, it doesn't work
I'm getting this error