Open Grant Proposal: `Peerbit`

Name of Project: Peerbit

Proposal Category: devtools-libraries

(Optional) Technical Sponsor: -

Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT, APACHE2, or GPL licenses?: Yes

Project Description

We are building a P2P database framework on top of the IPFS stack so that developers can build and maintain a distributed, private, searchable state across devices end to end.

We are solving two problems. We are bringing privacy and decentralization into the P2P database framework space by building a framework where encryption, distribution (sharding) are core features. Secondly, we have the goal of reducing infrastructure costs for all types of organizations by providing a framework that lets services utilize consumer hardware efficiently through smart auto-sharding that respects network dynamics and device capabilities. In addition to this, Peerbit can cut development time since you will not have to think about "backend" and "frontend" just the peer client.

We are confident in solving the problem outlined above since we have already spent months developing a working prototype that has the core functionalities such as sharding, encryption and distributed search. In addition to this, we have been deep diving into this space and know the benefits and shortfalls of alternative solutions that exist today, like ThreadDB and OrbitDB. Mainly the key problems we have identified to be problematic from existing technologies is that they don't provide sufficient privacy, scalability and good enough developer experience to compete with traditional tech stacks.

You can read more in depth about what Peerbit is today in the repository.

Value

The benefits for the ecosystem of getting this project right is that developers can unlock a large userspace that would create and store data in a distributed mindset. Developers can choose to either be part of the replication process in a network and store content on a local IPFS node, or for example use Filecoin to put the responsibility of storage on someone else.

The risks of not getting this project "right" except wasting time and money, is the risk with any distributed storage project: What happens if illegal/unwanted content gets distributed with this technology?

Technical risks include unforeseen weakness of the protocol that would lead to loss of data.

There are many technical challenges with this project, mainly how do one write a framework with the right amount of abstraction so that developer experience is good: Fast onboarding yet still allows for high configurability for the users that demand it. In addition to this, some technologies such as using WebRTC-transport are quite new and might lead to unexpected challenges that are hard to foresee.

Deliverables

When all milestones have been achieved Peerbit will be a framework for building:

P2P distributed state that persists in a network where nodes join and leave.
Automatic sharding that respects device capacity, such as storage, ram, disk, cpu, battery life and predicted availability
Chain agnostic and cross device identities. It does not matter if you are using Metamask (Eth), Phantom (Solana) or any other wallet to authorize yourself, since the protocol allows transactions to be signed with multiple types of signature algorithms. In addition the protocol supports the ability to link identities across devices so that you can modify an access controlled state from multiple devices seamlessly.
Read and write access controlled states
Example apps, support forum and rich documentation to help developers get started.
An easy method of updating programs nodes are running.
Privacy through E2EE with a roadmap how to implement forward secrecy and zero-knowledge access controllers.

Milestones outlined below are targeting the main tasks we have to complete to achieve this deliverable.

Development Roadmap

1. Automatic updates

In the Peerbit world replicator nodes can help networks by replicating content and providing search indices. They perform their job by simply subscribing to particular PubSub topics. Messages in these topics can instruct the node to open a particular database from a manifest and start replicating some content and build an index for search capabilities. At some point in time, there will be a need to update the software that helps the replicator node to interpret and handle different kinds of manifests. If I were a node provider, providing thousands of concurrently running nodes in different networks, it would be an overwhelmingly cumbersome job to maintain all nodes manually. Instead, it would make sense that the protocol itself can instruct the nodes to consider updates just as the protocol can instruct nodes to open a particular database. Hence with this milestone one can:

Allow updates to be suggested by peers for a network/cluster
Nodes can approve or reject updates depending on the identity the suggesting party
Support for safe/gradual rollouts so that no data is lost during the upgrade process

Subtotal: 160 hours (120 hours of research and implementation work, and 40 hours additional feature specific maintenance after rollout) * 80 USD/h = 12800 USD

ETA 4 weeks

Assignees: Marcus Pousette, developer

2. Improved sharding: Distribution that is conditioned on device capacity

With this milestone we improve the sharding algorithm by considering peer capacity. Some instances will be more powerful than others, hence the distribution of content and building the search index should be done accordingly.

Research and develop a deterministic algorithm that incorporates device capacity into the leader/replication election routine. It should respect device capacity, such as storage, ram, disk, cpu, battery life and predicted availability (all capacity features might not be possible to integrate, but do as many as feasible)
Analyze and build safeguards to make sure this feature does not introduce attack vectors for DDoS and privacy.

Subtotal: 160 hours. (120 hours of research and implementation work, and 40 hours additional feature specific maintenance after rollout) * 80 USD/h = 12800 USD

ETA 4 weeks

Assignees: Marcus Pousette, developer

3. Improved developer experience

With this milestone we have made onboarding super easy for developers with different levels of programming experience by providing a large collection of examples that resembles different kinds of use-cases and providing tools on how to easily setup and maintain nodes.

Build a library example projects consisting of (at least 3 examples):

Ideas:
- Distributed file storage
- Chess game
- Paint together
- E2EE chat
Create with an easy to use CLI for deploying a replicator node behind a domain with SSL certificate
Setup a developer support forum and chat so that minor questions can be answered quickly by the maintainers or the community.

Subtotal: 80 hours * 80 USD/h = 6400 USD

ETA 2 weeks

Assignees: Marcus Pousette, developer. Erik Allberg, product

4. Performant indexing for document stores

Right now, the computational complexity of making a query for a particular state locally in a Document store is linear to the amount of documents that exist (one has to go through every document to see if it matches the query). This could and have to be improved greatly. Performant and reliable query capabilities have to exist, fundamentally, if this framework ever is going to be considered as a goto way of building distributed applications. With this milestone we do this improvement by integrating a highly performing search index engine that allows peers to make content searchable to a greater extent.

Integrate Tantivy or ProblySearch (or any other feasible library) into the Document store data type
Allow nodes to choose what kind of indexing capability they want to provide for the network

This integration is non-trivial as there exist no implementation as of yet that can be compiled with WASM that have all the wanted indexing capabilities that are needed for this project. It might require some search engine implementation work.

Subtotal: 200 hours (160 hours of research and implementation work, and 40 hours additional feature specific maintenance after rollout) * 80 USD/h = 16000 USD

ETA 5 weeks

Assignees: Marcus Pousette, developer.

5. E2EE ZK-ACL and forward secrecy

With this milestone we have done thorough research and created a road map on how we can improve the security of the protocol. This is a pure research milestone to determine if it is possible to incorporate some powerful security and privacy measures.

Research on how Zero Knowledge proofs can let peers create access controllers that allow them to gate-keep logs without knowing the identities of the participants. Currently, if you want a relay that replicates a database with an identity based access controller, this relay will have to know the identities behind the commits in order to approve or reject them.
Research on how forward secrecy can be implemented for Peerbit. Create a definite roadmap on how to (if applicable) to integrate forward secrecy without unwanted side effects (e.g. imposed technical constraints on other features due to this). Forward secrecy could perhaps be an optional feature for client that want to pay the cost of the side effects for more security

Subtotal: 80 hours * 80 USD/h = 6400 USD

ETA 2 weeks

Assignees: Marcus Pousette, developer. Erik Allberg product.

Total Budget Requested

680 hours * 80 USD/h = 54400 USD

Maintenance and Upgrade Plans

We are committed to maintaining the code since we are to build and maintain a social, collaboration protocol on top of this framework which will require us to both maintain and improve the framework in future to be able to match all the challenging demands this will impose. In addition to this, we have developed Peerbit with a mindset that the codebase shall be super easy to understand even if it is packed with features to ensure that anyone who wants to contribute could learn and understand it in a short amount of time.

Team

Team Members

Marcus Pousette Background in Applied mathematics and Engineering Physics. 10 years of developer experience in total. 1 year of work dedicated to compiler technology. 2 years of work related to search engine technologies. 1 year of experience with the IPFS stack. Proficient in Rust and TypeScript.

Erik Allberg Co-founded market.xyz. Early core-contributor in Logseq. Background as founder of e-commerce startups. Been doing full-time R&D on the Global Giant Graph for 2 ½ years.

Team Member LinkedIn Profiles

https://www.linkedin.com/in/marcus-pousette-06092b102/

https://www.linkedin.com/in/allberg/

Team Website

https://dao.xyz/

Relevant Experience

We have separately spent years working in scalable data applications, compiler technology, search engine technologies, web3 and the IPFS stack, including implemeneting scaleable applications to mass-market.

Team code repositories

Peerbit: https://github.com/dao-xyz/peerbit

Other related repositories: https://github.com/dao-xyz

Additional Information

How did you learn about the Open Grants Program? Heard about it at the IPFS Camp in Lisbon.

Please provide the best email address for discussing the grant agreement and general next steps. marcus@dao.xyz

For Quiet (https://tryquiet.org) we need an IPFS-based CRDTs that supports deletion and multi-party encryption, and sharding by device capability is a really cool benefit too.

There's a huge amount of subtle details here so I'm not sure if we'll use Peerbit, but as a team building user-facing apps on IPFS I think Peerbit is exactly the type of thing that should be funded.

If the reviewers want additional understanding of the project: I took part of the Braid.org meeting group yesterday and help a presentation about what Peerbit is now and what some of the goals are

See from time 1:05:30 https://braid.org/meeting-49

Are clusters formed around application boundaries, or is it one global storage space? Or something else?

Chain-agnostic, multi-device and support any auth often looks like shifting that work to consumers up front, making barrier to adoption higher. What are you thinking the configuration of these will look like? Will you ship with some examples for common authentication approaches, and how to do these other things?

What deployment environments are supported out of the box? Is this for web apps or something else? Language agnostic API? You mention WASM, but is not clear where/how it fits in. Does app developer run a persistent service somewhere, or publishes to an emergent swarm of transient clients?

Are clusters formed around application boundaries, or is it one global storage space? Or something else?

Clusters are formed either around application boundaries or one global storage space, depending on how you configure your nodes, access controllers and encryption. The communication between peers is done through pubsub topics. If you want a private network for a single application you would explicitly configure libp2p for that and setup you application to talk for a specific pubsub topic.

If you want a global application/state space, you could use some global bootstrapping nodes and a predefined pubsub topic like "world_1" "world_2" which every app use and communicates through, though this might not be as performant and ideal for high performance applications

Chain-agnostic, multi-device and support any auth often looks like shifting that work to consumers up front, making barrier to adoption higher. What are you thinking the configuration of these will look like? Will you ship with some examples for common authentication approaches, and how to do these other things?

In my opinion the solution is actually easier to understand if you consider that multi-device authentication is something you can build on top of Peerbit by using Peerbit databases as a way of storing records on how devices connect (instead of doing a first class solution that is built in to the protocol in some way). This is already somewhat in place though there be some rework on the internals of the current solution. See the canAppend method of the example below.

import { field, variant } from "@dao-xyz/borsh";
import { Program, CanOpenSubPrograms } from "@dao-xyz/peerbit-program";
import { Documents, DocumentIndex } from "@dao-xyz/peerbit-document";
import { v4 as uuid } from "uuid";
import { Entry } from "@dao-xyz/ipfs-log";
import {
    getPathGenerator,
    getFromByTo, IdentityGraph
} from "@dao-xyz/peerbit-trusted-network";

@variant(0) // for versioning purposes, we can do @variant(1) when we create a new post type version
export class Post {
    @field({ type: "string" })
    id: string;

    @field({ type: "string" })
    message: string;

    constructor(properties?: { message: string }) {
        if (properties) {
            this.id = uuid();
            this.message = properties.message;
        }
    }
}

@variant("chat_room")
export class ChatRoom extends Program {
    @field({ type: Documents })
    rooms: Documents<Post>;

    // This a document store of identity relations that allows you to connect different identities together. So that a desktop identity can trust a mobile identity
    @field({ type: IdentityGraph })
    identityGraph: IdentityGraph;

    constructor(properties: {
        rooms?: Documents<Post>;
        identityGraph?: IdentityGraph;
    }) {
        super();
        this.identityGraph =
            properties.identityGraph || new IdentityGraph({});
        this.rooms =
            properties.rooms ||
            new Documents<Room>({
                index: new DocumentIndex({ indexBy: "id" }),
            });

    }

    // Setup lifecycle, will be invoked on 'open'
    async setup(): Promise<void> {
        await this.rooms.setup({
            type: Room,

            canAppend: (entry) => {
                return this.canAppend(entry); 
            },

            canRead: (identity) => {
                return Promise.resolve(true); // Anyone can search for posts
            },
        });
    }

    async canAppend(entry: Entry<any>): Promise<boolean> {
        // Else check whether its trusted by this access controller

        for(const signingKey of entry.publicKeys)
        {
            // Walk along the "trust" graph of identity relations, and check whether signingKey can append because it is trusted by someone who can append
            for await (const trustedByKey of getPathGenerator(
                signingKey,
                this.identityGraph.relationGraph,
                getFromByTo
            )) {

                // some access condition
                // for now just return true
                return true;

            }
        }
        return false;
    }

Authentication can also be done through encryption, for example one only appends entries that have be encrypted by a certain key, etc. Here is an example how it looks like when you send an encrypted commit to someone else


let doc = new Document({
    id: "123",
    name: "this document is not for everyone",
});

const someKey = await X25519PublicKey.create(); // a random receiver

// save document and send it to peers
const entry = await db.docs.put(doc, {
    receiver: {
        payload: [someKey],
        metadata: undefined,
        next: undefined,
        signatures: undefined,
    },
});

There will be guides/examples shipped that explains the best way to do authentication depending on different use-cases. For example, how your authentication approach would differ if you want to build a large-scale decentralized app vs if you wish to build an app that would mimic a client/server style.

What deployment environments are supported out of the box? Is this for web apps or something else? Language agnostic API? You mention WASM, but is not clear where/how it fits in. Does app developer run a persistent service somewhere, or publishes to an emergent swarm of transient clients?

Web apps and node apps are supported out of the box. But there will be a move later to rewrite core parts in a WASM compatible implementation in Rust, so we later can do bindings to other languages like Python and improve the performance for web and node applications. The API is language agnostic and can be invoked with any implementation of libp2p.

Does app developer run a persistent service somewhere, or publishes to an emergent swarm of transient clients?

This depends on what application you want to build. If you are building a chat app where you do not want to pay for a persistent service but instead relying on that at least a few peers are online at all time, you can do this. Let's say that every post should be replicated at least X times, where X / Network size > Probability that all content is available at all times. X ( Min replicas ), can be chosen on the application level with the current solution of Peerbit.

If you want to "pay" for a persistent service, you would launch some nodes in a datacenter (perhaps with autoscaling) that always chooses to replicate content to a higher degree than edge/clients.

The design philosophy is that you should be able to write really cost efficient applications AND resilient ones, and there should be a path to go between these two in a smooth way whenever that is needed.

Hi @marcus-pousette, thank you for your reply! Before proceeding with our review, can you briefly explain how you intend to build a user base? Do you have an adoption plan in place?

Great question! To begin with, we believe that value to the end-user is the main objective of any technology, and this has been our guiding principle when building Peerbit. We are very ambitious in this regard as we strive for mass-adoption of the data protocol and the applications built on top of them, with our primary strategy being product-led growth. Peerbit are providing very sought-after features that in combination, you can not get anywhere today in this approachable package. This offering is something that both Web3 and Web2 companies have shown great interest in taking part of.

The real deal-breaker with Peerbit and the main marketing idea is that applications built with Peerbit can be composable both in terms of the data/state and the GUI. Even if there are 5 competitive companies that want to build a social media or forum application, they could reuse logic from each other and elevate each other's progress rather than wasting time writing code that produces more or less identical functionality. With composability, app developers do not have to write complete platforms, they can specialize and do one thing very well.

As an example, for browser environments:

If you want to build a distributed video streaming platform with Peerbit then, this application can be composed into different parts. The video-streaming view app with UI. The chat window app with UI that viewers can use to interact with each other while watching the stream. There would be an application that stores user data, which might have or might not have dedicated UI elements. Then someone creates the platform itself and uses iframes to compose the different apps in a good way to provide the wanted behaviors.

Now imagine also that there is a company developing great video call meetings, they might rely on the same video-streaming app with UI as the streaming platform. Now when the developer of the video-streaming app improves the service in some way, let's say they add rewind functionality, the changes are going to propagate to all dependent applications seamlessly.

We believe that this is something unique and very sought-after for developers since you do not have to build something big to make a meaningful impact, you can just build a building block for someone else.

Hence, in the short term we are going to focus on building composable things to showcase how powerful this is by starting to build these building blocks, and we believe that this will be something that would both act as great documentation for the project, provide value as individual applications and lastly showcase how a developer can develop for a composable and distributed web.

Mainly we are looking into making these things:

A chat app with optional privacy
Video streaming app with optional privacy, rewind functionality and speech to text.
User data app (name, PFP, bio) with optional privacy
A platform that combines (1), (2), (3) and perhaps other applications with privacy and search

We believe that when developers and business owners see how easy these data rich applications are to develop, how private you can make them and how cheap they are to host (since they are P2P) there will be great interest to start to build with Peerbit.

Hey! @ErinOCon. We've actually launched now. Here is an example of video-streaming with libp2p + Peerbit, source code. One person that is building on top of our project already got their grant approved (https://github.com/ipfs/devgrants/issues/253).

Any updates on this proposal?

Hi @marcus-pousette, thank you for the update and congratulations on your progress! In light of the macroeconomic climate, we are currently in the process of re-evaluating our budget and priorities for this fiscal year. During this time, there is likely to be a delay in getting back to you regarding funding.

Please feel welcome to check back in the next few weeks (if you do not hear from us before then)!

Hi @marcus-pousette, thank you again for the update! Since this project is launched, can you confirm how you would use grant funding moving forward? What is your plan for getting the next five to ten developers? Do you have any plans to attend IPFS thing?

Thanks for the support. We fully understand the current macroeconomic climate and its affect on IPFS ability to fund projects. As the saying goes, this is the season of the builders, and we more committed than ever to build a decentralized web. Hopefully we can reach ways to propel our growth.

As for how we'd use grant money, it's mainly to improve performance and security of the project, and that includes making search more performant, DDoS protection for relays & replicators and build utilities so that developers can utilize forward-secrecy and social key-recovery. To speed up this development, we'd indeed be in dire need of strengthening the team with more builders, particularly P2P-dB developers and full-stack developers.

The plan to attract developers is by improving the protocol, building developer tooling, publishing more content regarding our vision and roadmap, personal developer relations and building cool product ourselves on top of Peerbit that's intended to drive consumer-usage but also attract developers to themselves build web-scale P2P apps. We're currently working on a P2P-livestreaming app that uses Peerbit for video-streaming, sending video-frames through the P2P-database every 25ms. This process also helps us identify and build necessary developer tooling.

We'd love to attend the IPFS Thing and meet all the peers in the IPFS ecosystem, but as you may understand, it's not within our budget.

Hi @marcus-pousette, thank you for your proposal and for your patience with our review. We would like to move your proposal forward to the next steps in our process! We will send an email with further details.

Hi! @ErinOCon. That's great news! Thank you

ipfs / devgrants

Open Grant: Peerbit #260

Open Grant Proposal: `Peerbit`

Project Description

Value

Deliverables