populse / capsul

Collaborative Analysis Platform : Simple, Unifying, Lean
Other
7 stars 14 forks source link

Toolboxes definition and implementation #268

Open sapetnioc opened 1 year ago

sapetnioc commented 1 year ago

Capsul must be modular. It is necessary to define how toolboxes can be added to the project and to implement the choosen solution.

c-langlet commented 1 year ago

As I see things the whole project is like a tree: capsul is the trunk, toolboxes are the branches and the GUI is the leafs. Each of these components should be as independent as possible to facilitate reproducibility, re-usability and software development although this may not be possible all the time. For me toolboxes should be developed as followed: a core part based on usual dependences of the community, an embedded part encapsulating commands defined in the core part and using the whole capsul environment (I/O system, completion, databasing, pipelining, etc.) and a GUI definition part (currently not sure about how this relates to the embedded part). Ideally, the core part would be issued from work from the team and interfaced with capsul when the need for an advanced pipelining system would happen. The core part could be exposed as such to the community and provide basic functionalities that may attract users for further use of the more advanced embedded part (using capsul). During the whole software installation the user could specify the toolboxes that he is interested in, the system would install only the required dependencies (listed in a toolbox requirement part) resulting in fine-tuned installation. We may discussed further the points raised.

sapetnioc commented 1 year ago

I agree on your vision except on one point : making Capsul the root of everything. It is what we have done for BrainVISA and the toolboxes are hidden behind BrainVISA. It becomes to be a problem when there are various toolboxes developed by different labs. I believe that it is important that the users identify clearly the toolboxes otherwise all the visibility would go to the Capsul team and not to the toolboxes developpers teams. For instance, I hear very frequently "I used BrainVISA" instead of "I used Morphologist".

For this reason, I prefer to see Capsul as a kind of operating system for pipelining and metadata management ; something very important but that is not put in front for communication. But this just a way to present things, it is not incompatible with what you said.

You pointed out a very important question for this issue : core system and toolboxes installation. If toolboxes are selected during installation, it means that they are not embedded in a monolithic virtual image, as in BrainVISA today, but are independent. A possible installation model would be to start by the installation of our "operating system" in the form of a virtual image and have the possibility (at install time or later) to add toolboxes. This raises many questions:

sapetnioc commented 1 year ago

Here is a first proposal about toolboxes to go on with discussion. In my mind, toolboxes are not specific to Capsul but would be more related to casa-distro. I wrote a few questions and answers giving a good idea of what could toolboxes be and how to implement the tools to manage them.

What is a toolbox

How a user choose toolboxes to install ?

One or several repositories are registered in user environment. By default, one repository URL is defined in the casa-distro distribution. The user has acces to a GUI to parse all available toolboxes defined in the repositories and select the one(s) to install. The final set of toolboxes to install (including dependencies) is presented to the user who has the possibility to accept/cancel installation.

What is the format of a toolbox package in a repository ?

The toolbox package format is composed of two files:

What happens when a toolbox is installed ?

Each toolbox has a unique identifier. Toolbox specific files are copied in the casa-distro environment and are accessible in the container as /casa/host/toolbox/<toolbox_id>. The download and installation process has several steps:

- Toolbox archive file and metadata file are downloaded in a temporary location
- Archive file content is extracted in directory `<environment_dir>/toolbox/<toolbox_id>`
- Metadata file is copied in `<environment_dir>/toolbox/<toolbox_id>.json`
- If the file `<environment_dir>/toolbox/<toolbox_id>/install_host.sh` exists, it is executed outside the container. This steps makes it possible to ask the user to identifiy resources that are either already present on his computer or that must be downloaded by the user due to legal reasons (for instance FSL).
- If the file `<environment_dir>/toolbox/<toolbox_id>/install.sh` exists, it is executed from within the container. This should be the main entry point for any toolbox customization. If this script needs files that are used only during installation (for example archive files that are extracted or sources files that are compiled), they can be embedded in the toolbox archive file in the `install` directory.
- The following elements are deleted if they exists:
    - File `<environment_dir>/toolbox/<toolbox_id>/install_host.sh` 
    - File `<environment_dir>/toolbox/<toolbox_id>/install.sh` 
    - Directory `<environment_dir>/toolbox/<toolbox_id>/install` 
    - Temporary archive file and metadata file

What is the format of a toolboxes repository ?

A repository is a web URL returning a JSON file. This file has the following structure:

{
    "<toolbox_id>": {
        "metadata": "<metadata file URL>",
        "metadata_checksum": "<metadata checksum>",
        "content": "<content file URL>",
        "content_checksum": "<content file checksum>",
        "requires": ["<toolbox id>", ...]
    }

    ...
}

What are the metadata of a toolbox ?

{
    "label": "<toolbox name>,
    "description": "<short description>",
    "url": "<toolbox documentation URL<>",
    "version": "<version of the toolbox>"
}

How a user choose toolboxes to update ?

Installed toolboxes are compared to the highest version available in repositories. Those with a more recent version are proposed for upgrade. The user choose which one(s) to upgrade ?

What happens when a toolbox is updated ?

Upgrade completely replace the toolbox directory and metadata. They are first backuped, then an installation of the new version is done and the backup is restored in case of error or deleted if the installation was succesful.