Open C-Zuge opened 4 months ago
I posted a partial explanation regarding the client side. https://github.com/apache/iceberg-python/issues/939#issuecomment-2234269294
For running the REST catalog server using this repo, you'd need to configure the server to be able to talk to your storage. For example, if you're trying to use Azure, here are some of the configs required https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/configuration.md#azure-data-lake
Another example when running REST catalog server with minio (s3 compatible API) https://github.com/kevinjqliu/iceberg-rest-catalog/blob/main/examples/sqlite-minio/docker-compose.yml#L11-L17
Also, its is not clear to me about the "vendors" folder, why do you clone the pyiceberg to the container and the usage of this is not clear to me as well.
While working on this repo, I discovered some bugs related to Pyiceberg. It was easier to iterate using Pyiceberg as submodule so that I can commit the fix right away. Some of these issues are upstreamed already (see https://github.com/apache/iceberg-python/issues/864)
To debug your issue above, look at the server log! HTTP 500 error usually indicates that the server ran into an error.
Regarding this case below, i fullfilled (almost) all the fields on the link here, but the adlfs.sas_token. For some unknown reason (at least for me) the error says about an "AWS Error NETWORK_CONNECTION" but should be using the azure connection. And this type of configuration i didnt found inside the Dockerfile neither other place but inside "tests" and "models" folders. Also, i put some comments inside the logs to be clear what operation i did in each step.
Code:
Error from Docker container:
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: 172.17.0.1:54386 - "GET /v1/config? warehouse=abfs%3A%2F%2Flanding%40sandboxnonprodstorage.dfs.core.windows.net%2F HTTP/1.1" 200 OK <------- FIRST REQUEST (Just to create the namespace and list the tables)
INFO: 172.17.0.1:54396 - "POST /v1/namespaces HTTP/1.1" 200 OK <------- NAMESPACE CREATION
INFO: 172.17.0.1:54396 - "GET /v1/namespaces HTTP/1.1" 200 OK <------- NAMESPACE LIST
INFO: 172.17.0.1:54396 - "GET /v1/namespaces/iceberg_rest/tables HTTP/1.1" 200 OK <----- TABLE'S LIST (Null as expected)
INFO: 172.17.0.1:32978 - "GET /v1/config?warehouse=abfs%3A%2F%2Flanding%40sandboxnonprodstorage.dfs.core.windows.net%2F HTTP/1.1" 200 OK <------ SECOND REQUEST (List namespaces, tables and create_table itself)
INFO: 172.17.0.1:32988 - "GET /v1/namespaces HTTP/1.1" 200 OK
INFO: 172.17.0.1:32988 - "GET /v1/namespaces/iceberg_rest/tables HTTP/1.1" 200 OK
INFO: 172.17.0.1:32988 - "POST /v1/namespaces/iceberg_rest/tables HTTP/1.1" 500 Internal Server Error <----CREATE_TABLE FUNCTION
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/iceberg_rest/api/catalog_api.py", line 297, in create_table
return _create_table(catalog, identifier, create_table_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/iceberg_rest/api/catalog_api.py", line 343, in _create_table
tbl = catalog.create_table(
^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/catalog/sql.py", line 208, in create_table
self._write_metadata(metadata, io, metadata_location)
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/catalog/__init__.py", line 843, in _write_metadata
ToOutputFile.table_metadata(metadata, io.new_output(metadata_path))
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/serializers.py", line 130, in table_metadata
with output_file.create(overwrite=overwrite) as output_stream:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/io/pyarrow.py", line 304, in create
if not overwrite and self.exists() is True:
^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/io/pyarrow.py", line 248, in exists
self._file_info() # raises FileNotFoundError if it does not exist
^^^^^^^^^^^^^^^^^
File "/home/iceberg/iceberg_rest/.venv/lib/python3.11/site-packages/pyiceberg/io/pyarrow.py", line 230, in _file_info
file_info = self._filesystem.get_file_info(self._path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_fs.pyx", line 584, in pyarrow._fs.FileSystem.get_file_info
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
OSError: When getting information for key 'rest/iceberg_rest.db/stations2000/metadata/00000-89d73996-40a2-458f-bdb9-1d1eff86a65b.metadata.json' in bucket 'warehouse': AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 7, Couldn't connect to server
the error says about an "AWS Error NETWORK_CONNECTION" but should be using the azure connection
The REST server is a wrapper around the underlying catalog. Looks like the catalog config is currently hardcoded to use AWS configs. https://github.com/kevinjqliu/iceberg-rest-catalog/blob/7c5548133ae266d4fac215b063911c35f08461d9/src/iceberg_rest/catalog.py#L14-L24
Would need to change this to also take in Azure configs instead
You can quickly verify this by passing the configs directly to this dict
Worked fine after inserting the connection string parameter inside this function. But also i saw that SQLCatalog is in use rather the RESTCatalog and i was wondering why this choice? Also i tried to change to RESTCatalog but got some issues on the server side shown below, how could i fix this to use properly the RESTCatalog rather the SQLCatalog? Also, i build the postgres version but its trying to use SQLite, why?
Changes:
Error on server side:
raise InvalidSchema(f"No connection adapters were found for {url!r}")
requests.exceptions.InvalidSchema: No connection adapters were found for 'sqlite:////tmp/warehouse/pyiceberg_catalog.db/v1/config'
But also i saw that SQLCatalog is in use rather the RESTCatalog and i was wondering why this choice?
This repo implements the REST Catalog server, it accepts HTTP requests and then proxies to the underlying catalog. The server needs to get/set table metadata. In this case, the metadata is ultimately saved in the SqlCatalog
.
You can make a change to replace SqlCatalog
with RestCatalog
, which means the metadata will ultimately be saved in another RestCatalog
service.
Also i tried to change to RESTCatalog but got some issues on the server side shown below, how could i fix this to use properly the RESTCatalog rather the SQLCatalog?
Don't change it to RestCatalog
, unless there's another REST catalog server you can point to.
Also, i build the postgres version but its trying to use SQLite, why?
Are you using docker? The uri
controls what data store is ultimately used
While using the pyiceberg got some issues/questions that blocked me, mainly regarding an internal server error 500 after the execution of a simple "create_table" function. Since I'm pretty new on iceberg stuff, probably I'm missing something that I don't know more about. Could anyone help me? I created a namespace and list it, but as soon as I try to create a table on my azure storage account I got the same error 500. My credentials are right, but im using the connection string and pointing the "warehouse" parameter to my storage account such as: "abfs://@.dfs.core.windows.net/".
I was looking the dockerfile and didnt saw anything that i should change, and the only files that was using the aws connection was inside models (that i believe that is to build new code with this models) and inside "tests" folder. Also, its is not clear to me about the "vendors" folder, why do you clone the pyiceberg to the container and the usage of this is not clear to me as well.