stac-utils / stac-fastapi-sqlalchemy

PostgreSQL backend for stac-fastapi using SQLAlchemy
MIT License
9 stars 10 forks source link

TypeError: 'sci:citation' is an invalid keyword argument for Collection #26

Open csaybar opened 3 years ago

csaybar commented 3 years ago

Hi again!

When I try to extent the collection fields, for instance, using the scientific extension (example) sci:citation I got the next error.

collection.json

{
    "id": "joplin",
    "stac_extensions": [
        "https://stac-extensions.github.io/scientific/v1.0.0/schema.json"
    ],
    "sci:citation": "testing_01",
    "description": "This imagery was acquired by the NOAA Remote Sensing Division to support NOAA national security and emergency response requirements. In addition, it will be used for ongoing research efforts for testing and developing standards for airborne digital imagery. Individual images have been combined into a larger mosaic and tiled for distribution. The approximate ground sample distance (GSD) for each pixel is 35 cm (1.14 feet).",
    "stac_version": "1.0.0-beta.2",
    "license": "public-domain",
    "links": [],
    "extent": {
        "spatial": {
            "bbox": [
                [
                    -94.6911621,
                    37.0332547,
                    -94.402771,
                    37.1077651
                ]
            ]
        },
        "temporal": {
            "interval": [
                [
                    "2000-02-01T00:00:00Z",
                    "2000-02-12T00:00:00Z"
                ]
            ]
        }
    }
}

error

Attaching to stac-db, stac-api, stac-fastapi-master_migration_1
stac-db      | The files belonging to this database system will be owned by user "postgres".
stac-db      | This user must also own the server process.
stac-db      | 
stac-db      | The database cluster will be initialized with locale "en_US.utf8".
stac-db      | The default database encoding has accordingly been set to "UTF8".
stac-db      | The default text search configuration will be set to "english".
stac-db      | 
stac-db      | Data page checksums are disabled.
stac-db      | 
stac-db      | fixing permissions on existing directory /var/lib/postgresql/data ... ok
stac-db      | creating subdirectories ... ok
stac-db      | selecting dynamic shared memory implementation ... posix
stac-db      | selecting default max_connections ... 100
stac-db      | selecting default shared_buffers ... 128MB
stac-db      | selecting default time zone ... Etc/UTC
stac-db      | creating configuration files ... ok
stac-db      | running bootstrap script ... ok
stac-db      | performing post-bootstrap initialization ... ok
stac-db      | syncing data to disk ... ok
stac-db      | 
stac-db      | 
stac-db      | Success. You can now start the database server using:
stac-db      | 
stac-db      |     pg_ctl -D /var/lib/postgresql/data -l logfile start
stac-db      | 
stac-db      | initdb: warning: enabling "trust" authentication for local connections
stac-db      | You can change this by editing pg_hba.conf or using the option -A, or
stac-db      | --auth-local and --auth-host, the next time you run initdb.
stac-db      | waiting for server to start....2021-03-31 09:47:09.557 UTC [47] LOG:  starting PostgreSQL 12.5 (Debian 12.5-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
stac-db      | 2021-03-31 09:47:09.558 UTC [47] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
stac-db      | 2021-03-31 09:47:09.569 UTC [48] LOG:  database system was shut down at 2021-03-31 09:47:09 UTC
stac-db      | 2021-03-31 09:47:09.572 UTC [47] LOG:  database system is ready to accept connections
stac-db      |  done
stac-db      | server started
stac-db      | CREATE DATABASE
stac-db      | 
stac-db      | 
stac-db      | /usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/10_postgis.sh
stac-db      | CREATE DATABASE
stac-db      | Loading PostGIS extensions into template_postgis
stac-api     | INFO:     Uvicorn running on http://0.0.0.0:8081 (Press CTRL+C to quit)
stac-api     | INFO:     Started reloader process [12] using watchgod
stac-db      | CREATE EXTENSION
stac-db      | CREATE EXTENSION
stac-db      | CREATE EXTENSION
stac-db      | CREATE EXTENSION
stac-db      | Loading PostGIS extensions into postgis
stac-db      | CREATE EXTENSION
stac-db      | CREATE EXTENSION
stac-db      | CREATE EXTENSION
stac-api     | INFO:     Started server process [14]
stac-api     | INFO:     Waiting for application startup.
stac-api     | INFO:     Application startup complete.
stac-db      | CREATE EXTENSION
stac-db      | 
stac-db      | waiting for server to shut down...2021-03-31 09:47:11.685 UTC [47] LOG:  received fast shutdown request
stac-db      | .2021-03-31 09:47:11.686 UTC [47] LOG:  aborting any active transactions
stac-db      | 2021-03-31 09:47:11.687 UTC [47] LOG:  background worker "logical replication launcher" (PID 54) exited with exit code 1
stac-db      | 2021-03-31 09:47:11.697 UTC [49] LOG:  shutting down
stac-db      | 2021-03-31 09:47:11.754 UTC [47] LOG:  database system is shut down
stac-db      |  done
stac-db      | server stopped
stac-db      | 
stac-db      | PostgreSQL init process complete; ready for start up.
stac-db      | 
stac-db      | 2021-03-31 09:47:11.797 UTC [1] LOG:  starting PostgreSQL 12.5 (Debian 12.5-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
stac-db      | 2021-03-31 09:47:11.797 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
stac-db      | 2021-03-31 09:47:11.797 UTC [1] LOG:  listening on IPv6 address "::", port 5432
stac-db      | 2021-03-31 09:47:11.798 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
stac-db      | 2021-03-31 09:47:11.806 UTC [93] LOG:  database system was shut down at 2021-03-31 09:47:11 UTC
stac-db      | 2021-03-31 09:47:11.809 UTC [1] LOG:  database system is ready to accept connections
migration_1  | INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
migration_1  | INFO  [alembic.runtime.migration] Will assume transactional DDL.
migration_1  | INFO  [alembic.runtime.migration] Running upgrade  -> 131aab4d9e49, create initial schema
migration_1  | INFO  [alembic.runtime.migration] Running upgrade 131aab4d9e49 -> 77c019af60bf, use timestamptz rather than timestamp
stac-api     | 'sci:citation' is an invalid keyword argument for Collection
stac-api     | NoneType: None
stac-api     | INFO:     172.18.0.4:47712 - "POST /collections HTTP/1.1" 500 Internal Server Error
stac-api     | ERROR:    Exception in ASGI application
stac-api     | Traceback (most recent call last):
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 396, in run_asgi
stac-api     |     result = await app(self.scope, self.receive, self.send)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
stac-api     |     return await self.app(scope, receive, send)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/fastapi/applications.py", line 199, in __call__
stac-api     |     await super().__call__(scope, receive, send)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/starlette/applications.py", line 111, in __call__
stac-api     |     await self.middleware_stack(scope, receive, send)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
stac-api     |     raise exc from None
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
stac-api     |     await self.app(scope, receive, _send)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
stac-api     |     raise exc from None
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
stac-api     |     await self.app(scope, receive, sender)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/starlette/routing.py", line 566, in __call__
stac-api     |     await route.handle(scope, receive, send)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/starlette/routing.py", line 227, in handle
stac-api     |     await self.app(scope, receive, send)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/starlette/routing.py", line 41, in app
stac-api     |     response = await func(request)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/fastapi/routing.py", line 201, in app
stac-api     |     raw_response = await run_endpoint_function(
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/fastapi/routing.py", line 150, in run_endpoint_function
stac-api     |     return await run_in_threadpool(dependant.call, **values)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
stac-api     |     return await loop.run_in_executor(None, func, *args)
stac-api     |   File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
stac-api     |     result = self.fn(*self.args, **self.kwargs)
stac-api     |   File "./stac_fastapi/api/routes.py", line 34, in _endpoint
stac-api     |     resp = func(request_data, request=request)
stac-api     |   File "./stac_fastapi/sqlalchemy/transactions.py", line 41, in create_collection
stac-api     |     data = self.collection_table.from_schema(model)
stac-api     |   File "./stac_fastapi/sqlalchemy/models/database.py", line 69, in from_schema
stac-api     |     return cls(**cls.get_database_model(schema))
stac-api     |   File "<string>", line 4, in __init__
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/sqlalchemy/orm/state.py", line 433, in _initialize_instance
stac-api     |     manager.dispatch.init_failure(self, args, kwargs)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
stac-api     |     compat.raise_(
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
stac-api     |     raise exception
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/sqlalchemy/orm/state.py", line 430, in _initialize_instance
stac-api     |     return manager.original_init(*mixed[1:], **kwargs)
stac-api     |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/sqlalchemy/ext/declarative/base.py", line 839, in _declarative_constructor
stac-api     |     raise TypeError(
stac-api     | TypeError: 'sci:citation' is an invalid keyword argument for Collection
migration_1  | Traceback (most recent call last):
migration_1  |   File "scripts/ingest_joplin.py", line 36, in <module>
migration_1  |     ingest_joplin_data()
migration_1  |   File "scripts/ingest_joplin.py", line 20, in ingest_joplin_data
migration_1  |     r.raise_for_status()
migration_1  |   File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status
migration_1  |     raise HTTPError(http_error_msg, response=self)
migration_1  | requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://app:8081/collections

This is really curious because when I try to extent with some other fields like keyword no errors appears.

collection.json

{
    "id": "joplin",
    "stac_extensions": [
        "https://stac-extensions.github.io/scientific/v1.0.0/schema.json"
    ],
    "keywords": [
        "L1C",
        "Sentinel2"
    ],    
    "description": "This imagery was acquired by the NOAA Remote Sensing Division to support NOAA national security and emergency response requirements. In addition, it will be used for ongoing research efforts for testing and developing standards for airborne digital imagery. Individual images have been combined into a larger mosaic and tiled for distribution. The approximate ground sample distance (GSD) for each pixel is 35 cm (1.14 feet).",
    "stac_version": "1.0.0-beta.2",
    "license": "public-domain",
    "links": [],
    "extent": {
        "spatial": {
            "bbox": [
                [
                    -94.6911621,
                    37.0332547,
                    -94.402771,
                    37.1077651
                ]
            ]
        },
        "temporal": {
            "interval": [
                [
                    "2000-02-01T00:00:00Z",
                    "2000-02-12T00:00:00Z"
                ]
            ]
        }
    }
}

error

NO ERROR

I tried to modify the files '/stac_api/models/schemas.py' and /stac_fastapi/sqlalchemy/models/database.py but with no success any help will be appreciated!

PS: I tried the same in versions 1.1.0 and 1.0.0 and this problem persist.

kbgg commented 3 years ago

We're getting the same error when adding our Radiant MLHub collections which all extend the scientific extension

lossyrob commented 3 years ago

I've run into a similar problem - this is due to a current limitation of stac-fastapi on handling custom models. The limitation is described here: https://github.com/stac-utils/stac-fastapi/issues/58. The issue you're seeing is the top-level "sci:citiation" property doesn't map to any stac-pydantic field. Even if you were able to use a subclassed stac-pydantic model that did add that field, the SQLAlchemy backend wouldn't support it because there's no column for it in the database model. This is a tricky issue that had me stuck on a fork of arturo-stac-api, in which I added a database change to account for this in this commit.

The upcoming pgstac backend will not have the problem on the database side, as it will use JSONB for all the Collection (and also Item) properties and won't need additional columns for custom data fields. However, the stac-pydantic model subclassing issue will still remain.

I think a solution to this would be to add a setting to ApiSettings that lets the users of stac-fastapi to provide subclasses of stac-pydantic models that they want to use in place of the original models. This would let custom models to be set when declaring the API, and allow custom fields to be put into place. There you could pass in a custom Collection subclass with the that looks similar to the custom Collection used in the commit linked to above.

The sore spot with that solution is that, if you encounter another extension, the custom model will have to be updated to account for that. I don't believe there's a catch-all in stac-pydantic to deal with arbitrary extensions (or to just allow properties that it doesn't recognize). This might be a reason to consider relaxing the validation constrictions on the pydantic models, or even to move to PySTAC as the internal STAC model (though we'd lose some of the FastAPI <-> Pydantic nicities, which tbh I've found more confusing/in the way than beneficial, at least besides the auto OpenAPI schema definitions).

geospatial-jeff commented 3 years ago

I think a solution to this would be to add a setting to ApiSettings that lets the users of stac-fastapi to provide subclasses of stac-pydantic models that they want to use in place of the original models.

Agreed, ApiSettings is the "bucket of state" for all the things that don't quite fit elsewhere, and I think its fine to continue this pattern until it becomes unsustainable and needs to be rethought. The one hiccup I see here is it might make testing harder, as we need an instance of ApiSettings to test the backend clients which seems to be a violation of separation of concerns, but this might be unavoidable.

I don't believe there's a catch-all in stac-pydantic to deal with arbitrary extensions (or to just allow properties that it doesn't recognize)

You can set extra = 'allow' in the model config (https://pydantic-docs.helpmanual.io/usage/model_config/) to allow arbitrary properties which aren't defined in the model itself.

This might be a reason to consider relaxing the validation constrictions on the pydantic models, or even to move to PySTAC as the internal STAC model

We've done this for the GET /search and POST /search endpoints because the fields extension can exclude fields that are required by the underlying pydantic models. I thought it worked out nicely, might be worth trying this with the other endpoints to allow for more flexibility with the model definitions.

moradology commented 3 years ago

After some discussion with @geospatial-jeff and @lossyrob, I've spiked out a small prototype of what would be needed with the current SqlAlchemy backend to get custom models working. There are definitely some issues that need to be ironed out but which probably require inputs from people more involved in library design than me; I've highlighted the two most pressing concerns in the body of the draft (#134)