tethysplatform / tethys

The Tethys Platform main Django website project repository.
http://tethysplatform.org/
BSD 2-Clause "Simplified" License
92 stars 50 forks source link

[FEATURE] File Handling Overhaul #1031

Closed swainn closed 4 months ago

swainn commented 6 months ago

Is your feature request related to a problem? Please describe. Workspaces and static files sometimes get used for configuration files, but neither is really a great place for them, because you end up with syncing issues in production due to the sym-linking. In addition, getting dynamically created files publicly accessible because there isn't a file API for static files like there is for workspaces. Static files is also not a good place for user uploaded content, because static files are assumed to be trusted "source code" files, and uploaded content should be treated as suspect at best.

Describe the solution you'd like We add additional file management and refactor the workspaces API as follows:

There should be 4 places for files:

In addition we overhaul how workspaces works so that symbolic linking is no longer used in production and remove the workspaces folder from the scaffold. The workspaces (and media) folders would be managed folders and users wouldn't need to know where the folders are.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

sdc50 commented 5 months ago

Similar to the Workspaces API I think there should be 3 ways to access these paths:

  1. controller decorator
  2. app class methods
  3. SDK getter functions

(note that the workspaces also as workspaces decorators that I think should be deprecated).

from tethys_sdk.paths import (
  get_app_workspace,
  get_user_workspace, 
  get_app_resources, 
  get_app_media, 
  get_user_media, 
  get_app_public,
)

Create a new class that is mostly the same as the TethysWorkspace class but update it to use Path objects by default.

class TethysPath:
    """
    Defines objects that represent paths (directories) for apps and users.

    Attributes:
      path(Path): The absolute path to the workspace directory. Cannot be overwritten.
    """

    def __init__(self, path, read_only=False):
        """
        Constructor
        """
        self._path = Path(path).resolve()
        assert not self._path.is_file()
        # Create the path if it doesn't already exist
        self._path.mkdir(parents=True, exists_okay=True)

        self._read_only = read_only

    def __repr__(self):
        """
        Rendering
        """
        return '<TethysPath path="{0}">'.format(self._path)

    @property
    def path(self):
        # Note that this is different from TethysWorkspace in that it now returns a python Path object
        return self._path

    @property
    def read_only(self):
        return self._read_only

    def files(self, names_only=False):
        """
        Return a list of files (as Path objects by default) that are in the workspace.

        Args:
          names_only(bool): Returns list of filenames as strings when True. Defaults to False.

        Returns:
          list: A list of files in the workspace.

        **Examples:**

        ::

            # List file names
            workspace.files()

            # List full path file names
            workspace.files(full_path=True)

        """
        path, dirs, files = next(os.walk(self.path))
        if names_only:
            return files
        return [self.path / f for f in files]

    def directories(self, names_only=False):
        """
        Return a list of directories (as Path objects by default) that are in the workspace.

        Args:
          names_only(bool): Returns list of directory names  as strings when True. Defaults to False.

        Returns:
          list: A list of directories in the workspace.

        **Examples:**

        ::

            # List directory names
            workspace.directories()

            # List full path directory names
            workspace.directories(full_path=True)

        """
        path, dirs, files = next(os.walk(self.path))
        if names_only:
            return dirs
        return [self.path / d for d in dirs]

    def clear(self, exclude=None, exclude_files=False, exclude_directories=False):
        """
        Remove all files and directories in the workspace.

        Args:
          exclude(iterable): A list or tuple of file and directory names to exclude from clearing operation.
          exclude_files(bool): Excludes all files from clearing operation when True. Defaults to False.
          exclude_directories(bool): Excludes all directories from clearing operation when True. Defaults to False.

        **Examples:**

        ::

            # Clear everything
            workspace.clear()

            # Clear directories only
            workspace.clear(exclude_files=True)

            # Clear files only
            workspace.clear(exclude_directories=True)

            # Clear all but specified files and directories
            workspace.clear(exclude=['file1.txt', '/full/path/to/directory1', 'directory2', '/full/path/to/file2.txt'])

        """
        if self.read_only:
            raise RuntimeError('Read only TethysPaths cannot be cleared')

        if exclude is None:
            exclude = list()

        files = self.files()

        if not exclude_files:
            for file in files:
                if file not in exclude and file.name not in exclude:
                    file.unlink()

        if not exclude_directories:
            directories = self.directories()
            for directory in directories:
                if directory not in exclude and directory.name not in exclude:
                    shutil.rmtree(directory)

    def remove(self, item):
        """
        Remove a file or directory from the workspace.

        Args:
          item(str): Name of the item to remove from the workspace.

        **Examples:**

        ::

            workspace.remove('file.txt')
            workspace.remove('/full/path/to/file.txt')
            workspace.remove('relative/path/to/file.txt')
            workspace.remove('directory')
            workspace.remove('/full/path/to/directory')
            workspace.remove('relative/path/to/directory')
            workspace.remove(path_object)

        **Note:** Though you can specify relative paths, the ``remove()`` method will not allow you to back into other directories using "../" or similar notation. Futhermore, absolute paths given must contain the path of the workspace to be valid.

        """  # noqa: E501
        if self.read_only:
            raise RuntimeError('Cannot remove files from read-only TethysPaths')        
        item = Path(item).resolve()

        assert item.relative_to(self.path).  #TODO add an if statement with a helpful error message

        if item.is_dir():
            shutil.rmtree(item)
        elif item.is_file():
            item.unlink()

    def get_size(self, units="b"):
        total_size = 0
        for file in self.files():
            total_size += os.path.getsize(file)

        if units.lower() == "b":
            conversion_factor = 1
        else:
            storage_units = _get_storage_units()
            conversion_factor = [
                item[0] for item in storage_units if units.upper() in item[1]
            ][0]

        return total_size / conversion_factor
sdc50 commented 5 months ago

Here are some helper methods to implment:

def _resolve_app_class(app_class_or_request):
    """
    Returns and app class
    """
    pass

def _resolve_username(user_or_request):
    """
    Gets the username from user or request object
    (Also check quotas?)
    """

def _get_app_workspace(app):
    """
    Gets the root workspace directory for an app. Uses TETHYS_WORKSPACES_ROOT setting
    """
    return Path(settings.TETHYS_WORKSPACES_ROOT) / app.package

def get_app_workspace(app_or_request) -> TethysPath:
    if settings.USE_OLD_WORKSPACES_API:
        return _get_app_workspace_old(app_or_request)

    app = _resolve_app_class(app_or_request)
    return _get_app_workspace(app)

def _get_user_workspace(app, username):
    app_workspace = get_app_workspace(app)
    return TethysPath(app_workspace.path / username)

def get_user_workspace(app_class_or_request, user_or_request) -> TethysPath:
    if settings.USE_OLD_WORKSPACES_API:
        return _get_user_workspace_old(app_class_or_request, user_or_request)

    app = _resolve_app_class(app_class_or_request)
    username = _resolve_username(user_or_request)
    return _get_user_workspace(app, username)

def _get_app_media(app):
    """
    Gets the root media directory for an app. Uses MEDIA_ROOT setting.
    """
    return Path(settings.MEDIA_ROOT) / app.package

def get_app_media(app_or_request):
    app = _resolve_app_class(app_or_request)
    return _get_app_media(app)

def _get_user_media(app, username):
    app_media = get_app_media(app)
    return TethysPath(app_media.path / username)

def get_public_path(app_or_extension):
    """
    Gets the public directory of an app or extension as a read-only TethysPath
    """
    return app_or_extension.public_path

def get_resources_path(app_or_extension):
    """
    Gets the resources directory of an app or extension as a read-only TethysPath
    """
    return app_or_extension.resources_path

Basics of TethysBase

from importlib.resources import files.  # This is new in Python 3.9

class TethysBase(TethysBaseMixin):

    @property
    def _package_files(self):
        return files(f"{self.package_namespace}.{self.package}")

    @property
    def public_path(self):
        return TethysPath(self._package_files / 'public')

    @property
    def resources_path(self):
        return TethysPath(self._package_files / 'resources')
sdc50 commented 5 months ago

Do we want to include arguments in the controller decorator to access all of these paths?


@controller(
    app_workspace=True,
    user_workspace=True,
    app_media_path=True,
    user_media_path=True,
    app_resource_path=True,
    app_public_path=True,
)
def my_controller(request, app_workspace, user_workspace, app_media_path, user_media_path, app_resource_path, app_public_path):
    ...

Maybe we also want them for the consumer decorator?

sdc50 commented 5 months ago

Decide on naming convention:

get_app_media or get_app_media_path?

get_resources_path or get_app_resouces_path or get_resources?

swainn commented 5 months ago

Do we want to include arguments in the controller decorator to access all of these paths?

@controller(
    app_workspace=True,
    user_workspace=True,
    app_media_path=True,
    user_media_path=True,
    app_resource_path=True,
    app_public_path=True,
)
def my_controller(request, app_workspace, user_workspace, app_media_path, user_media_path, app_resource_path, app_public_path):
    ...

Maybe we also want them for the consumer decorator?

Yes, we definitely want them in the consumer decorator too. It would be nice if we could generalize the two decorators somehow so we don't have to implement everything twice. Or at least have a formal way to implement it twice.

swainn commented 5 months ago

Decide on naming convention:

get_app_media or get_app_media_path?

get_resources_path or get_app_resouces_path or get_resources?

I like this naming convention:

get_app_workspace
get_user_workspace
get_app_media
get_user_media
get_app_public
get_app_resources
sdc50 commented 5 months ago

I like this naming convention:

get_app_workspace
get_user_workspace
get_app_media
get_user_media
get_app_public
get_app_resources

The way that I've implemented get_resources_path it will work with an TethysExtensionBase object, which is why I initially didn't have app in the name.

However, since it can accept a request object and derives the app_class from that, it may be best to call it as you suggested.

sdc50 commented 5 months ago

Yes, we definitely want them in the consumer decorator too.

The consumer decorator is tricky since it doesn't have a request object. I think the easiest thing to do would be to piggy-back on the TethysAsyncWebsocketConsumerMixin class that @c-krew created and make methods to get the paths. So they wouldn't be arguments in the consumer decorator.

sdc50 commented 5 months ago

@swainn, @shawncrawley, @gagelarsen

Should we deprecate the app_workspace and user_workspace decorators, or should we add decorators to get the rest of the paths?

swainn commented 5 months ago

@swainn, @shawncrawley, @gagelarsen

Should we deprecate the app_workspace and user_workspace decorators, or should we add decorators to get the rest of the paths?

I think it is ok to deprecate these decorators, but we should maintain support for the workspace arguments in the controller decorator.

swainn commented 4 months ago

Implemented by #1037