0x4007 commented 3 weeks ago

Standardizing Plug-in Data Storage in Organization-Wide Configuration Repository

Objective

Establish a standardized method for storing plug-in data in the .ubiquibot-config repository, ensuring data integrity and security. An additional benefit is that this allows partners full control over their data and decentralizes the data storage.

Specification

Storage Structure

Each plug-in will have its own JSON database file.
The filename of each JSON database will be the plug-in ID.
This ensures that plug-ins cannot tamper with each other's data.

JSON Database Format

Each JSON file will store data specific to its corresponding plug-in.
The structure within the JSON file is determined by the plug-in's requirements.

Example

For a plug-in with ID @ubiquibot/command-start-stop, the JSON file will be named ubiquibot-command-start-stop.json.

{
    "dataKey1": "value1",
    "dataKey2": "value2",
    ...
}

Access Control

The kernel will manage read and write permissions.
Write access will be restricted to ensure plug-ins can only modify their own JSON file.
Read access can be granted based on plug-in ID, allowing access to other plug-ins' data as needed.

Implementation

Repository Setup
- Use the .ubiquibot-config repository as the general-purpose utility repository per organization.
- Configure GitHub App permissions to allow the kernel to manage repository access.
Kernel Configuration
- Ensure the kernel has write access to the repository.
- Implement read access control based on plug-in IDs.

Security Considerations

Restrict write permissions to prevent unauthorized modifications.

GitHub App Permissions

The kernel requires the following GitHub App permissions:
- Read and write access to the configuration repository.

Benefits

Data Integrity and Security: By isolating each plug-in's data in its own JSON file, we ensure that plug-ins cannot interfere with each other’s data.
Partner Control: Partners have full control over their data, enhancing privacy and security.
Decentralized Storage: Decentralizing data storage minimizes the risk of data breaches and central points of failure.
Simplified Development: Standardizing data storage eliminates the need to handle different data providers when developing plugins. Methods in our SDK will make it simple for plugin developers to store and access data.

Summary

By standardizing the storage of plug-in data in separate JSON files named after the plug-in ID, we ensure data integrity and security. The kernel will manage access control, providing a robust framework for plug-in data management, and simplifying the development process for plugin developers.

0x4007 commented 3 weeks ago

First step is to ensure that assumptions are accurate.

Have the kernel push code to the repository when working in another repository.
Be able to read JSON databases from the other plugins.
Do all of this without requiring threatening permissions.

gentlementlegen commented 2 weeks ago

Having something self contained is a great idea, and would probably make plugin development easier if we didn't have to spin up a db instance for each plugin. However json format might be limited at some point which is why I would suggest something more robust like SQLite.

I think ideally plugins should not rely on the Kernel for reading their own content, but be responsible themselves for it. However we will always reach a limitations when it comes to user / wallet retrievals as this data should be shared for everything, otherwise we would end up with duplicate DBs which would be difficult to maintain and update.

We still have one issue remaining which is the storage. Even by using JSON, SQLite or any file base system, we need to store / read / update the content. First, it might trigger security issues if the data becomes sensitive. Second, it would have atomic requirements since many runs could occur in parallel.

0x4007 commented 2 weeks ago

On the fence about SQLite. It's nice that it handles so many catastrophic errors out of the box but I also would rather ensure that plugin development is as easy as possible for new developers.

Auditing a plaintext json object is way easier than working with a database or having to find a database viewer. The files are stored as binary objects in SQLite.

gentlementlegen commented 2 weeks ago

Yes that's a nice thing to consider. For me the advantage is also that it is easier to have:

generated types based on the schema
query engine, so easier to aggregate, sort etc.
migration system, if any change in the schema is needed
backup and copies
security for data loss (ACID)
atomicity
lower memory consumption, so less resource hungry (JSON would put the whole file in memory)

With JSON, you would need to write a manual script for any schema change. Each plugin would have its own custom code for query which is very error prone and tedious to maintain, and way less performant. If two plugins access the data, or if the server crashes, very high chances to break and lose the whole content. All of these reason would be quite a trade-off just to be able to view data.

For me, IntelliJ comes with a built-in viewer for my DB so I actually never leave my IDE. VsCode has a similar plugin to view them: https://marketplace.visualstudio.com/items?itemName=qwtel.sqlite-viewer

rndquu commented 1 week ago

Plain JSON storage is useful only for really simple and small plugins. It is not scalable at all compared to any RDBMS. Why don't we let plugin developers choose the storage they want (i.e. need for specific task) instead of forcing them to use a solution that is applicable to only a small part of storage use cases.

0x4007 commented 1 week ago

Let's start with plain JSON files and then we can add more advanced support later if needed. None of our existing plugins have any sort of complex data querying needs.

There is no need to over-engineer things "just in case" if we haven't gotten close to those hypothetical problems in a couple of years of r&d for the existing bot capabilities.

rndquu commented 1 week ago

Let's start with plain JSON files and then we can add more advanced support later if needed. None of our existing plugins have any sort of complex data querying needs.

There is no need to over-engineer things "just in case" if we haven't gotten close to those hypothetical problems in a couple of years of r&d for the existing bot capabilities.

My point is that we don't need to add storage support to the SDK at all since: 1) We won't cover all possible use cases 2) We should give plugin developers a freedom of selection a storage solution they want to use + fits plugin use case

There is no need to over-engineer

Exactly, there is no need to implement "save to JSON file SDK" from scratch when plugin developers can setup this feature in an hour using any npm package

gentlementlegen commented 1 week ago

My question would be more "where do we store it"? Because my experience so far right now the major problem I encounter with plugins is where do I store the data. Currently I have access to Supabase, but other contributors don't.

Letting the developer chose its own solution is ok, but say they chose Neo4j somehow to do their plugin, how do we handle this? Because we should not rely on that external contributor to have its own instance, so we should definitely be in control of the data. Also JSON would mean anything can read, and potentially write into it.

rndquu commented 1 week ago

Letting the developer chose its own solution is ok, but say they chose Neo4j somehow to do their plugin, how do we handle this?

Why do we need to handle it? Let the developer use Neo4j.

we should definitely be in control of the data

We should be in control of the data related only to the core plugins (conversation rewards, permit generations, etc...). We don't need access to 3rd party plugins data.

0x4007 commented 1 week ago

It is attractive to DAOs especially to decentralize the storage and to allow them to own their own data. In addition, it makes plugin development simple and straightforward for debugging. That is why JSON storage in the utility repository that the bot already requires .ubiquibot-config makes sense.

The implementation logic can be any existing framework, that's fine. But it needs to authenticate via the kernel to write to the repository.

ubiquibot / plugin-template

Standard Storage Solution #2

Standardizing Plug-in Data Storage in Organization-Wide Configuration Repository

Objective

Specification

Storage Structure

JSON Database Format

Example

Access Control

Implementation

Security Considerations

GitHub App Permissions

Benefits

Summary