integritee-network / worker

Integritee off-chain worker and sidechain validateer
Apache License 2.0
89 stars 46 forks source link

IPFS API with verification in enclave #70

Closed brenzi closed 4 years ago

brenzi commented 4 years ago

implement an API to ipfs which enables the STF to:

Gitcoin Bounty

design approach

Please describe your preferred approach in this thread prior to implementing it!

dev environment

You can use our docker to emulate SGX in SW mode, so you don't need SGX enabled HW for this task

acceptance tests:

brenzi commented 4 years ago

helpful resources:

How IPFs hash is created out of a DAG of 256kB chunks: https://medium.com/swapynetwork/generating-ipfs-multihash-offline-2edb2618b93b

IPFS node in rust: https://github.com/ipfs-rust/rust-ipfs

IPFS http API in rust: https://github.com/ferristseng/rust-ipfs-api

brenzi commented 4 years ago

A first version could do the following:

  1. run standalone IPFS node
  2. enclace can tell worker to fetch a file from IPFS synchronously
  3. worker fetches file via http request to IPFS node and stores it to disk (file name = IPFS hash)
  4. when the host-call returns, the enclave reads the file from disk into memory
  5. the enclave recreates the IPFS hash from the file contents and verifies that it matches the expected hash it requested

fetching IPFS content is already implemented. However, passing the content as a function return value might not scale to larger files. Moreover, the contents is not verified based on its ipfs hash

gitcoinbot commented 4 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 400.0 DAI (400.0 USD @ $1.0/DAI) attached to it.

gitcoinbot commented 4 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work has been started.

These users each claimed they can complete the work by 266 years, 5 months from now. Please review their action plans below:

1) developerfred has applied to start work _(Funders only: approve worker | reject worker)_.

I already have previous experience with substrate and ipfs i would like to work on this issue. 2) whalelephant has been approved to start work.

Hi,

I think I would follow the first version approach (since not comfortable / not sure how much work is required for using no_std for the rust-ipfs-api).

High level work plan:

Learn more on the Gitcoin Issue Details page.

brenzi commented 4 years ago

@x5engine: could you please share your brief work plan? How would you approach this?

Web3Foundation commented 4 years ago

@brenzi @x5engine Any updates regarding working on this issue?

x5engine commented 4 years ago

Hey guys, I will send implementation I did after tomorrow, we made a good implementation for this.

brenzi commented 4 years ago

@x5engine please see above: you haven't been assigned yet, we asked you for a sketch on how you will approach this. We'd like to review your design before we approve

x5engine commented 4 years ago

hey @brenzi sorry for the confusion, well I would plan to query the ipfs Node directly to get the has and save the file to the system then when requested I'd check the checksum of the file with the metadata checksum.

so, in other words, it should compute the checksum prior to uploading it

here something like it https://github.com/filecoin-project/go-filecoin/issues/1925

brenzi commented 4 years ago

@Web3Foundation please assign @x5engine to work on this issue. The work plan is not very specific yet, but a more detailed design should be paid work I think.

@x5engine: You mentioned "uploading". Be aware that our task asks for a read interface, not a write interface. We want to fetch IPFS content from within the enclave STF. In order to avoid frustration, please provide us with a more detailed design first before implementing it. Where in our code would you do what?

x5engine commented 4 years ago

Hey, @brenzi @Web3Foundation thank you for approving me,

So here are two strategies to get this implemented: Following your suggestion

  1. On IPFS node fetch a file by CID
  2. Save File to the system
  3. get the enclave to recreate the has from the content using https://medium.com/swapynetwork/generating-ipfs-multihash-offline-2edb2618b93b but I think this approach is very slow even if it is straightforward since there is no need to save the file to generate the hash and verify it again, so skipping the save step would be very helpful

The other Suggestion is verifying the file inside the enclave with either fetching the file inside it with ipfs HTTP API then generate the hash from the content fetched or we can make use of https://github.com/brenzi/rust-ipfs-verifier (which you made @brenzi ) no even downloading the file is better (but might be needed in your requirement)

But a file read might be very costly when handling large files...

Here is the Implementation of the CID in Rust:

https://github.com/multiformats/rust-cid

using this we can add test to check fetched data against their CID in the enclave:

    let data = cid.to_bytes();
    let out = Cid::try_from(data).unwrap();

    assert_eq!(cid, out);

So tell me which approach you want to do?

Note: keep in mind there are differences in CID version and how they get generated,

brenzi commented 4 years ago

My preference would be to use IPFS node http API directly without storing the content to disk. You might find inspiration in my outdated rust-verify-ipfs crate, but it isnt more than a code sample. Also, it assumes content is <256kB which you can't assume for this task. concerning cid versions: as long as we can put content with the recent ipfs client and read that from our stf thats fine. Just document your choices with an example

brenzi commented 4 years ago

this might be interesting for you https://github.com/w3f/General-Grants-Program/pull/283#event-3421536358

x5engine commented 4 years ago

My preference would be to use IPFS node http API directly without storing the content to disk. You might find inspiration in my outdated rust-verify-ipfs crate, but it isnt more than a code sample. Also, it assumes content is <256kB which you can't assume for this task. concerning cid versions: as long as we can put content with the recent ipfs client and read that from our stf thats fine. Just document your choices with an example

Awesome, so we should go with this option and I can work on what you have on the rust-verify-ipfs crate and make it work with any size we need, for the version i think we might be good and maybe handle the two versions.

this might be interesting for you w3f/General-Grants-Program#283 (comment)

This is so good :) I guess I can apply or use the work I am doing here there too? how is it done?

btw do you have a discord channel? might be faster for communication.

brenzi commented 4 years ago

meet us on riot: https://riot.im/app/#/room/#substratee:matrix.org

If you can re-use the code you write here for added functionality on substrate, I would guess that's be very welcome

gitcoinbot commented 4 years ago

@x5engine Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

gitcoinbot commented 4 years ago

@x5engine Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

x5engine commented 4 years ago

hey @gitcoinbot I am still working on this and will respond tomorrow with all I have, thanks

gitcoinbot commented 4 years ago

@x5engine Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

gitcoinbot commented 4 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


@x5engine due to inactivity, we have escalated this issue to Gitcoin's moderation team. Let us know if you believe this has been done in error!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

brenzi commented 4 years ago

@x5engine You have showed no single line of code so far even after several reminders. You also haven't asked any questions that would indicate that you're actually working on this. I have to ask @Web3Foundation to take this issue away from you

Web3Foundation commented 4 years ago

@brenzi taking @x5engine off this issue as there has been no response in quite some time, re-opening to other contributors.

Web3Foundation commented 4 years ago

@brenzi, @whalelephant also applied to work.

brenzi commented 4 years ago

@whalelephant are you still interested? How would you approach the task and on what timeline?

whalelephant commented 4 years ago

Hi @brenzi

Yes still interested. Copied from the my application on gitcoin:

I think I would follow the first version approach (since not comfortable / not sure how much work is required for using no_std for the rust-ipfs-api).

High level work plan:

  • environment set up (to get familiar with substraTEE and the docker SGX)
  • understand /+ solve the limitations of ocall_read_ipfs on larger content (>256kB)
  • worker write file to disk and enclave reads it into memory
  • enclave create DAG from the chunks in memory and create the multiHash
  • decode CID to multiHash to verify

I have also applied to do Hackusama so I think for this bounty will be 4-6 weeks, but do let me know what you have in mind

brenzi commented 4 years ago

@whalelephant this sounds good to me. @Web3Foundation please assign @whalelephant.

whalelephant commented 4 years ago

thanks! will probably have quite a few questions so will follow up on riot @brenzi

gitcoinbot commented 4 years ago

@whalelephant Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

whalelephant commented 4 years ago

yep, got the dev env setup

gitcoinbot commented 4 years ago

@whalelephant Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

gitcoinbot commented 4 years ago

@whalelephant Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

whalelephant commented 4 years ago

submitted WIP PR

gitcoinbot commented 4 years ago

@whalelephant Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

gitcoinbot commented 4 years ago

@whalelephant Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

gitcoinbot commented 4 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


@whalelephant due to inactivity, we have escalated this issue to Gitcoin's moderation team. Let us know if you believe this has been done in error!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

gitcoinbot commented 4 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


@whalelephant due to inactivity, we have escalated this issue to Gitcoin's moderation team. Let us know if you believe this has been done in error!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

brenzi commented 4 years ago

@whalelephant Do we have to adjust the problem statement due to your reasoning on unixfs not implemented for no_std? May I ask you to conclude your current challenges and possible ways to move forward here?

whalelephant commented 4 years ago

The initial implementation outlined was that chunks of data themselves form the multihash of the CID was incorrect. The current understanding of the multihash part of the default leave (unixfs) is the hash of a Unixfs struct with ipfs metadata, option for links to other leaves, option for data.

The challenge is that unixfs lib does not currently have no_std to work in the enclave rs-ipfs/rust-ipfs/#247.

If the file is added using the raw arg, then the multihash should be the hash of the data in raw bytes with prefixed hashing algo. In this case, we can perhaps follow the links of the root leave (the CID we get back from the /add) and its children and their children etc to collect the CID(v1) for the raw leaves and verify them against the data we have written to the disk sliced into chunks.

This is my understanding at the moment. Will be grateful for feedback

brenzi commented 4 years ago

As we mainly need to read ipfs content that we publish ourselves it would be acceptable to use some specific ipfs options to make our lifes easier at the cost of loss of generality. Do you see a shortcut for this?

whalelephant commented 4 years ago

I don't think this is a shortcut but I think we can use ipfs api such as ipfs refs to traverse the tree to get all links and then only compare the hashes of raw leaves. In other words, instead of constructing the DAG with data in the enclave, we give the enclave the fetched data and an array of links, and the enclave splits and hash the fetched data to compare with that array.

brenzi commented 4 years ago

but then the enclave would need to trust this list of links, right? This wouldn't be acceptable. We might want to look into the unixfs crate and see how hard it may be to support no_std....

whalelephant commented 4 years ago

ah ok, that makes sense, that would be placing trust outside of the enclave. Will have a look🤓

gitcoinbot commented 4 years ago

@whalelephant Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

gitcoinbot commented 4 years ago

@whalelephant Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

gitcoinbot commented 4 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


@whalelephant due to inactivity, we have escalated this issue to Gitcoin's moderation team. Let us know if you believe this has been done in error!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

gitcoinbot commented 4 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


@whalelephant due to inactivity, we have escalated this issue to Gitcoin's moderation team. Let us know if you believe this has been done in error!

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

whalelephant commented 4 years ago

looks like there has been work on it in rust-ipfs :)

brenzi commented 4 years ago

@whalelephant : I was able to complete all acceptance tests in docker and you have fulfilled the problem statement entirely. It was worth waiting for this solution. Thank you! Will merge your PR

@Web3Foundation please reward @whalelephant with the stated bounty.

brenzi commented 4 years ago

@whalelephant if you need endorsement to push your changes on dependencies upstream, I'd be happy to support. The goal should really be to drop your forks and get all your no_std stuff upstream

whalelephant commented 4 years ago

@brenzi that would be great, currently still waiting for feedback / review from the base 3 PRs, then we need to get the cid crate and the ipfs-unixfs merged after them: https://github.com/OrKoN/base-x-rs/pull/14 https://github.com/multiformats/rust-multihash/pull/81 https://github.com/multiformats/rust-multibase/pull/25