graphops / file-hosting-service

Rust implementation of Subfile Data Service
https://github.com/graphops/subfile-data-service
Apache License 2.0
4 stars 0 forks source link

payments for file data service #4

Open neriumrevolta opened 11 months ago

hopeyen commented 11 months ago

Goal

Imagine a decentralized file-sharing market supporting

Our primary contenders for file transfer protocols are 1) HTTP direct transfer, 2) Torrents, and 3) IPFS.

TL;DR comparison table

Aspect Decentralized HTTPS with TLS in Rust Torrent with Micropayments in Rust IPFS with Payments in Rust Best Option
Protocol Architecture Client-server model, extended to decentralized servers for partial downloads. Peer-to-peer architecture, ideal for distributed file sharing with chunked data. Decentralized, content-addressable network, suitable for distributed sharing. Torrent for inherit P2P nature
Speed and Performance High for small to medium files, moderate for very large files due to HTTP overhead. High, optimized for large files, efficient in distributing bandwidth. Moderate, dependent on network state and data availability. Torrent for its efficiency with large files and bandwidth distribution.
Interleaved Micropayments Require a payment verification system for range download request. Require a payment verification system for chunked data transfer. Challenging due to the need for integrating payments into a decentralized system with partial file access. HTTPS for middleware flexibility
Escrow (collateralization) and Trust Requires robust external mechanisms for trust and escrow. Requires robust external mechanisms for trust and escrow. Requires robust external mechanisms for trust and escrow. All 3 are similar in needing a blockchain/subgraph client.
Data Integrity and Security High with TLS, but dependent on server integrity. High, with built-in mechanisms for data verification. Moderate, relies on network integrity and node trustworthiness. Torrent for its robust data verification mechanisms. Need to separately implement verification for HTTPS
Verification Processes Standard HTTPS verification, extended for decentralized servers, require verification of partial data. Inherent in the protocol, with additional layers for payment verification. Requires additional verification aside from content addressable ID. Torrent for its inherent and efficient verification processes.
User Experience (Servers/Clients) Familiar for the users, decentralized network aspect adds complexity. Familiar in P2P context, micropayments add a layer of complexity. Less familiar, requires understanding of decentralized systems and payments. HTTPS for its familiarity and ease of use.
Matching Algorithms Requires development of algorithms for server-client matching based on QoS and price. Inherent in the protocol, but needs extension for price-based matching. Complex, requires innovative matching algorithms in a decentralized market. Similar
Library Maturity (Rust) High. Established libraries for HTTP and TLS. Low. Leecher clients available; Seeder clients and micropayments integration less mature. Low to Moderate. Growing but less mature than HTTP libraries. HTTPS for its mature and robust libraries in Rust.
Implementation Complexity (Rust) Moderate. Leveraging existing HTTPS libraries; Payments and matching algorithm adds complexity. High. Integration of Torrent protocol with micropayments is complex. High. Combining IPFS with payment systems is innovative but complex. HTTPS for simpler implementation in Rust.
Community Support (Rust) Strong. Active development in HTTP/TLS within the Rust community. Low. Limited support for Torrents with micropayments. Growing. Increasing interest in decentralized systems like IPFS. HTTPS for strong community support in Rust.
Innovation Potential (Rust) Moderate. Extension of existing protocols. High. Novel approach integrating micropayments with Torrents. Very High. Cutting-edge concept in file sharing. IPFS for highest innovation potential in Rust.

Decentralized HTTPS with TLS is the most practical and reliable choice, excelling in library maturity in Rust, flexibility, ease of development and usage. Meanwhile Torrent with Micropayments offers a balance of innovation and complexity, provides a novel solutions in file sharing, and IPFS with Payments is the most innovative and flexible with higher complexity and a less mature ecosystem.

More details

HTTPS File Transfer with TLS

A secure method of transferring files over the internet, utilizing the standard HTTP protocol combined with TLS encryption. This approach ensures that data transferred between the client and server is encrypted, safeguarding against interception and unauthorized access. It supports partial downloads, allowing clients to request specific ranges of a file, which is particularly useful for large files. This method is widely used due to its robust security features, compatibility with existing web infrastructure, and ease of implementation in a variety of contexts, including web browsers and standalone applications.

Architecture

Data Transfer and Integrity

Scalability

Security

User Experience

Summary

Torrent Protocol

Inherently peer-to-peer and decentralized, stands out as a strong candidate. Its efficiency in distributing large files through chunked downloads aligns well with the requirement of handling terabyte-sized data. Require adaptation to include micropayments for each chunk transfer, matching algorithm that pairs multiple peers for serving and requesting files, and using on-chain escrow mechanisms. This approach aims to enhance the efficiency and availability of file sharing by financially rewarding seeders, potentially leading to a scalable and censorship resistant p2p network.

Architecture

Data Transfer and Integrity

Scalability

Security

User Experience

Summary

The Torrent protocol offers a robust and scalable solution for decentralized file sharing, particularly effective for large files due to its distributed nature and chunk-based transfer system. While it excels in data transfer efficiency and network resilience, it faces challenges in security, trust management, and legal compliance. The user experience can vary widely based on the health of the torrent swarm and the community around it.

IPFS with paid access

Adapting from IPFS as a content-addressable P2P storage network, we consider a version incorporating a payment system for accessing files, where users pay to retrieve data from other nodes in the network. The integration of payments aims to incentivize the hosting and distribution of files, potentially improving the availability and reliability of data within the IPFS network. IPFS does not natively support micropayments or file transfer payments. Integrating such a system into IPFS would be complex, requiring external layers or applications to handle payments and collateralization. Additionally, IPFS's typical use case involves public files, and adapting it for private, paid transfers of partial files would be challenging. The protocol may struggle with very large files, which is a critical requirement in this scenario.

Architecture

Data Transfer and Integrity

Scalability

Security

User Experience

Potential next step

Put hands on doing a POC of HTTP file transfers, explore if there are unforeseen difficulties

chriswessels commented 11 months ago

Great job here @hopeyen! Agreed with the comparison of the options and the relative strengths/weaknesses. Also agree that we should implement an HTTP PoC next.

I have left some questions that we should be considering here: https://www.notion.so/graphops/Subfile-Service-a59a801b27094e4589cebd52a081ca5f?pvs=4

hopeyen commented 10 months ago

Next step:

Reach out with TAP's team

hopeyen commented 9 months ago

More concrete plan:

Subfile server

  1. create a file to track (indexer address and indexer url)
  2. upload the file to IPFS
  3. create an allocation signer
  4. compute allocation id over IPFS file containing indexer url - uniqueAllocationID(indexerMnemonic: string,epoch: number,deployment: SubgraphDeploymentID, existingIDs: Address[])
  5. generate allocationIdProof using allocation signer, allocation id, and indexer address
  6. Staking contract allocate using ```this.network.contracts.staking.populateTransaction.allocateFrom( indexer: this.network.specification.indexerOptions.address, subgraphDeploymentID: deployment.bytes32, tokens: amount, allocationID: allocationId, metadata: utils.hexlify(Array(32).fill(0)), proof,)
  7. all subfiles served at the indexer url should be now available for discovery and queries. Indexers can update the subfiles list without closing/reallocating. Indexers must renew allocations by max lifetime or to update indexer url
  8. close allocation ``` await this.network.contracts.staking.populateTransaction.closeAllocation( allocationID: string poi: BytesLike }
  9. reallocate ReallocateTransactionParams { closingAllocationID: string poi: BytesLike indexer: string subgraphDeploymentID: BytesLike tokens: BigNumberish newAllocationID: string metadata: BytesLike proof: BytesLike }

Subfile clients

hopeyen commented 8 months ago

Protocol V1 vs Horizon

Today during system architect office hours, I asked about how will data service be incorporated in Horizon; what will be the new design for service registry and staking contracts.

I understood that (though nothing is set in stone yet)

Approach 1 Indexers register their public_url against the service registry. If it is a dedicated registry then the number of tries is limited to those who specifically opted-in to service registry, otherwise there must be a filter for indexers serving file endpoints. Indexer opens up allocations for a particular file or a bundle of files.

From the network subgraph, client can
-> queries for all active allocations against a deployment hash representing their target file. -> reads for all available indexer endpoints. -> reads active allocations from all indexers, grabs the underlying deployment hash, and filters for the target file.

This approach should be easily migrated to Horizon, as service registry should then support multiple entries per indexer across data services, and allocations will no longer be correlated with indexing rewards but still indicate service for query fees.

Approach 2 Indexers do not separately register their url, and indexers do not open allocations against individual files. Indexers create and publish a file containing their file server url, and allocate against the file. Indexers are free to update their file serving status at any point without a need to notify the network, but clients will not be able to rely on the network subgraph, instead, they must go through all active allocations for a file that contains a file server url; or they obtain url from an indexer off-chain. -> This decreases the cost for indexers as they can be more flexible with their service without board-casting on-chain, but increases the runtime requirement for the client to discover indexers' target file availability. Optionally, indexers can periodicially gossip their file serving status, and gossip nodes (listener-radio-esque) can store and update the status and serve an API endpoint for the client to check for availabilty.

This approach might or might not be easily migrated to Horizon, has minimal on-chain activity, doesn't require staking for a particular file, but not secured economically.