litestar-org / litestar

Production-ready, Light, Flexible and Extensible ASGI API framework | Effortlessly Build Performant APIs
https://litestar.dev/
MIT License
5.42k stars 372 forks source link

Unexpected behavior from `module_to_os_path` #3554

Closed ghferrari closed 1 month ago

ghferrari commented 3 months ago

Description

The file litestar/utils/module_loader.py contains a function definition for module_to_os_path.

Assuming I've understood the code comments, the purpose of this code is to return the path to a directory, which is either the base directory of the project or (when supplied with the name of a module) the base directory of the module.

Unfortunately, if you define your Litestar object in a file named app.py, and there is no other module named app, then this function returns the path to that file. This is already an error given that the intention is to return a path to a directory.

I noticed this problem while attempting to set up my own starting configuration following the litestar-fullstack repository. This repository defines a BASE_DIR property at https://github.com/litestar-org/litestar-fullstack/blob/8e6edb90a401778741062a8383ff6e6f354b44dd/src/app/config/base.py#L23 using module_to_os_path which is then used to define subdirectories for alembic. In my case (using a file named app.py, no module named app), this produces invalid directory pathnames which cannot then be created. For example, when attempting to run as per the MCVE below, I see the error:

  File "/home/ghf/projects/dappel-litestar/.venv/lib/python3.11/site-packages/alembic/command.py", line 99, in init
    script._generate_template(
  File "/home/ghf/projects/dappel-litestar/.venv/lib/python3.11/site-packages/alembic/script/base.py", line 593, in _generate_template
    util.template_to_file(src, dest, self.output_encoding, **kw)
  File "/home/ghf/projects/dappel-litestar/.venv/lib/python3.11/site-packages/alembic/util/pyfiles.py", line 41, in template_to_file
    with open(dest, "wb") as f:
         ^^^^^^^^^^^^^^^^
NotADirectoryError: [Errno 20] Not a directory: '/home/ghf/projects/dappel-litestar/src/app.py/db/migrations/alembic.ini'

URL to code causing the issue

https://github.com/litestar-org/litestar/blob/84f51c8afc3203cd4914922b2ec3c1e92d5d40ba/litestar/utils/module_loader.py#L21

MCVE

from __future__ import annotations
from typing import TYPE_CHECKING
from advanced_alchemy.extensions.litestar import (
    AlembicAsyncConfig,
    AsyncSessionConfig,
    SQLAlchemyPlugin,
    SQLAlchemyAsyncConfig,
    async_autocommit_before_send_handler,
)
import binascii
import json
import os
from dataclasses import dataclass, field
from functools import lru_cache
from pathlib import Path
from typing import TYPE_CHECKING, Any, Final

from advanced_alchemy.utils.text import slugify
from litestar.serialization import decode_json, encode_json
from litestar.utils.module_loader import module_to_os_path
from redis.asyncio import Redis
from sqlalchemy import event
from sqlalchemy.ext.asyncio import AsyncEngine, create_async_engine
from sqlalchemy.pool import NullPool

if TYPE_CHECKING:
    from litestar.data_extractors import RequestExtractorField, ResponseExtractorField

DEFAULT_MODULE_NAME = "app"
BASE_DIR: Final[Path] = module_to_os_path(DEFAULT_MODULE_NAME)

TRUE_VALUES = {"True", "true", "1", "yes", "Y", "T"}

@dataclass
class DatabaseSettings:
    ECHO: bool = field(
        default_factory=lambda: os.getenv("DATABASE_ECHO", "False") in TRUE_VALUES,
    )
    """Enable SQLAlchemy engine logs."""
    ECHO_POOL: bool = field(
        default_factory=lambda: os.getenv("DATABASE_ECHO_POOL", "False") in TRUE_VALUES,
    )
    """Enable SQLAlchemy connection pool logs."""
    POOL_DISABLED: bool = field(
        default_factory=lambda: os.getenv("DATABASE_POOL_DISABLED", "False") in TRUE_VALUES,
    )
    """Disable SQLAlchemy pool configuration."""
    POOL_MAX_OVERFLOW: int = field(default_factory=lambda: int(os.getenv("DATABASE_MAX_POOL_OVERFLOW", "10")))
    """Max overflow for SQLAlchemy connection pool"""
    POOL_SIZE: int = field(default_factory=lambda: int(os.getenv("DATABASE_POOL_SIZE", "5")))
    """Pool size for SQLAlchemy connection pool"""
    POOL_TIMEOUT: int = field(default_factory=lambda: int(os.getenv("DATABASE_POOL_TIMEOUT", "30")))
    """Time in seconds for timing connections out of the connection pool."""
    POOL_RECYCLE: int = field(default_factory=lambda: int(os.getenv("DATABASE_POOL_RECYCLE", "300")))
    """Amount of time to wait before recycling connections."""
    POOL_PRE_PING: bool = field(
        default_factory=lambda: os.getenv("DATABASE_PRE_POOL_PING", "False") in TRUE_VALUES,
    )
    """Optionally ping database before fetching a session from the connection pool."""
    URL: str = field(default_factory=lambda: os.getenv("DATABASE_URL", "sqlite+aiosqlite:///db.sqlite3"))
    """SQLAlchemy Database URL."""
    MIGRATION_CONFIG: str = f"{BASE_DIR}/db/migrations/alembic.ini"
    """The path to the `alembic.ini` configuration file."""
    MIGRATION_PATH: str = f"{BASE_DIR}/db/migrations"
    """The path to the `alembic` database migrations."""
    MIGRATION_DDL_VERSION_TABLE: str = "ddl_version"
    """The name to use for the `alembic` versions table name."""
    FIXTURE_PATH: str = f"{BASE_DIR}/db/fixtures"
    """The path to JSON fixture files to load into tables."""
    _engine_instance: AsyncEngine | None = None
    """SQLAlchemy engine instance generated from settings."""

    @property
    def engine(self) -> AsyncEngine:
        return self.get_engine()

    def get_engine(self) -> AsyncEngine:
        if self._engine_instance is not None:
            return self._engine_instance
        if self.URL.startswith("postgresql+asyncpg"):
            engine = create_async_engine(
                url=self.URL,
                future=True,
                json_serializer=encode_json,
                json_deserializer=decode_json,
                echo=self.ECHO,
                echo_pool=self.ECHO_POOL,
                max_overflow=self.POOL_MAX_OVERFLOW,
                pool_size=self.POOL_SIZE,
                pool_timeout=self.POOL_TIMEOUT,
                pool_recycle=self.POOL_RECYCLE,
                pool_pre_ping=self.POOL_PRE_PING,
                pool_use_lifo=True,  # use lifo to reduce the number of idle connections
                poolclass=NullPool if self.POOL_DISABLED else None,
            )
            """Database session factory.

            See [`async_sessionmaker()`][sqlalchemy.ext.asyncio.async_sessionmaker].
            """

            @event.listens_for(engine.sync_engine, "connect")
            def _sqla_on_connect(dbapi_connection: Any, _: Any) -> Any:  # pragma: no cover
                """Using msgspec for serialization of the json column values means that the
                output is binary, not `str` like `json.dumps` would output.
                SQLAlchemy expects that the json serializer returns `str` and calls `.encode()` on the value to
                turn it to bytes before writing to the JSONB column. I'd need to either wrap `serialization.to_json` to
                return a `str` so that SQLAlchemy could then convert it to binary, or do the following, which
                changes the behaviour of the dialect to expect a binary value from the serializer.
                See Also https://github.com/sqlalchemy/sqlalchemy/blob/14bfbadfdf9260a1c40f63b31641b27fe9de12a0/lib/sqlalchemy/dialects/postgresql/asyncpg.py#L934  pylint: disable=line-too-long
                """

                def encoder(bin_value: bytes) -> bytes:
                    return b"\x01" + encode_json(bin_value)

                def decoder(bin_value: bytes) -> Any:
                    # the byte is the \x01 prefix for jsonb used by PostgreSQL.
                    # asyncpg returns it when format='binary'
                    return decode_json(bin_value[1:])

                dbapi_connection.await_(
                    dbapi_connection.driver_connection.set_type_codec(
                        "jsonb",
                        encoder=encoder,
                        decoder=decoder,
                        schema="pg_catalog",
                        format="binary",
                    ),
                )
                dbapi_connection.await_(
                    dbapi_connection.driver_connection.set_type_codec(
                        "json",
                        encoder=encoder,
                        decoder=decoder,
                        schema="pg_catalog",
                        format="binary",
                    ),
                )
        elif self.URL.startswith("sqlite+aiosqlite"):
            engine = create_async_engine(
                url=self.URL,
                future=True,
                json_serializer=encode_json,
                json_deserializer=decode_json,
                echo=self.ECHO,
                echo_pool=self.ECHO_POOL,
                pool_recycle=self.POOL_RECYCLE,
                pool_pre_ping=self.POOL_PRE_PING,
            )
            """Database session factory.

            See [`async_sessionmaker()`][sqlalchemy.ext.asyncio.async_sessionmaker].
            """

            @event.listens_for(engine.sync_engine, "connect")
            def _sqla_on_connect(dbapi_connection: Any, _: Any) -> Any:  # pragma: no cover
                """Override the default begin statement.  The disables the built in begin execution."""
                dbapi_connection.isolation_level = None

            @event.listens_for(engine.sync_engine, "begin")
            def _sqla_on_begin(dbapi_connection: Any) -> Any:  # pragma: no cover
                """Emits a custom begin"""
                dbapi_connection.exec_driver_sql("BEGIN")
        else:
            engine = create_async_engine(
                url=self.URL,
                future=True,
                json_serializer=encode_json,
                json_deserializer=decode_json,
                echo=self.ECHO,
                echo_pool=self.ECHO_POOL,
                max_overflow=self.POOL_MAX_OVERFLOW,
                pool_size=self.POOL_SIZE,
                pool_timeout=self.POOL_TIMEOUT,
                pool_recycle=self.POOL_RECYCLE,
                pool_pre_ping=self.POOL_PRE_PING,
            )
        self._engine_instance = engine
        return self._engine_instance

@dataclass
class Settings:
    db: DatabaseSettings = field(default_factory=DatabaseSettings)

    @classmethod
    def from_env(cls, dotenv_filename: str = ".env") -> Settings:
        from litestar.cli._utils import console

        env_file = Path(f"{os.curdir}/{dotenv_filename}")
        if env_file.is_file():
            from dotenv import load_dotenv

            console.print(f"[yellow]Loading environment configuration from {dotenv_filename}[/]")

            load_dotenv(env_file)
        return Settings()

@lru_cache(maxsize=1, typed=True)
def get_settings() -> Settings:
    return Settings.from_env()

settings = get_settings()

from litestar import Litestar

#pdb.set_trace()

alchemy=SQLAlchemyPlugin(
    config=SQLAlchemyAsyncConfig(
        engine_instance=settings.db.get_engine(),
        before_send_handler=async_autocommit_before_send_handler,
        session_config=AsyncSessionConfig(expire_on_commit=False),
        alembic_config=AlembicAsyncConfig(
            version_table_name=settings.db.MIGRATION_DDL_VERSION_TABLE,
            script_config=settings.db.MIGRATION_CONFIG,
            script_location=settings.db.MIGRATION_PATH,
        )
    )
)

app = Litestar(
    plugins=[alchemy,],
)

Steps to reproduce

1. Create a new project folder and copy the code above to a file name `app.py`
2. Run `litestar database init db`

Screenshots

"![SCREENSHOT_DESCRIPTION](SCREENSHOT_LINK.png)"

Logs

No response

Litestar Version

2.9.0

Platform


[!NOTE]
While we are open for sponsoring on GitHub Sponsors and OpenCollective, we also utilize Polar.sh to engage in pledge-based sponsorship.

Check out all issues funded or available for funding on our Polar.sh dashboard

  • If you would like to see an issue prioritized, make a pledge towards it!
  • We receive the pledge once the issue is completed & verified
  • This, along with engagement in the community, helps us know which features are a priority to our users.

Fund with Polar

cofin commented 3 months ago

@ghferrari, do you have an __init__.py your migrations folder?

ghferrari commented 3 months ago

Hi @cofin,

I think the error above is reproducible if you have a minimal folder/file structure like this:

/parent-folder
    /src
        app.py

Use my MCVE above as the contents of app.py. In addition, I use a virtualenv with these packages installed:

advanced_alchemy
alembic
asyncpg
litestar[standard]
redis
sqlalchemy

I also have a postgres database running in docker and set my database URL as an environment variable via export DATABASE_URL="postgresql+asyncpg://name:password@localhost/dbname

In you follow this setup, the db folder doesn't exist when I run the litestar database init db command.

For me, this produces this error:

[...]
  File "/home/ghf/projects/litestar-test/src/.venv/lib/python3.11/site-packages/alembic/command.py", line 99, in init
    script._generate_template(
  File "/home/ghf/projects/litestar-test/src/.venv/lib/python3.11/site-packages/alembic/script/base.py", line 593, in _generate_template
    util.template_to_file(src, dest, self.output_encoding, **kw)
  File "/home/ghf/projects/litestar-test/src/.venv/lib/python3.11/site-packages/alembic/util/pyfiles.py", line 41, in template_to_file
    with open(dest, "wb") as f:
         ^^^^^^^^^^^^^^^^
NotADirectoryError: [Errno 20] Not a directory: '/home/ghf/projects/litestar-test/src/app.py/db/migrations/alembic.ini'

As you can see in the final error line, app.py is being used (incorrectly) as the name of a directory. This is coming from module_to_os_path.

cofin commented 3 months ago

@ghferrari I had a chance to re-read this. Let me double check a few things and I'll report back.

ghferrari commented 3 months ago

@cofin In case it helps, I tried two things:

  1. Use my latest instructions above and then create the .../src/db folder and add an __init__.py file in there. Now, when I run litestar database init db from within the src folder, I see:

    alembic.util.exc.CommandError: Directory db already exists and is not empty
  2. Use my latest instructions above and add an __init__.py file in .../src. In that case, I see the original error.

cofin commented 1 month ago

closed by #3565