starlinglab / integrity-backend

Backend server for registering and configurable processing of authenticated assets in the Starling Integrity framework.
MIT License
9 stars 3 forks source link

IPFS CID pin orchestration through cluster into private swarm #86

Open benhylau opened 2 years ago

benhylau commented 2 years ago

Task Summary

📅 Due date: N/A
🎯 Success criteria: Be able to take encrypted archives and pin them across private swarm in staging.

[@YurkoWasHere to fill]

Files to store:

To Do

benhylau commented 2 years ago

Here are several links Storj shared in our last call that'd be helpful to go over as we investigate cross-pinning onto their storage network:

galargh commented 2 years ago

Here's the script that @YurkoWasHere shared during the last sync:

diff --new-line-format="" --unchanged-line-format="" <(ipfs pin ls --type=recursive | cut -f1 -d' ' | ipfs cid format -b base32 -v 1 | sort | uniq) <(ipfs-cluster-ctl pin ls | cut -f1 -d' ' | ipfs cid format -b base32 -v 1 | sort | uniq) | parallel ipfs-cluster-ctl pin add --no-status {}

If I understand correctly, the script finds objects that are pinned to local storage but not in the cluster and pins them in the cluster. Could you let me know where we intend to use it? I couldn't find it myself in the repos I have access to.

If it is to make sure that everything we add to ipfs is pinned in ipfs-cluster as well then I think replacing ipfs add with ipfs-cluster-ctl add should do the trick since the latter adds stuff to ipfs, pins it there and pins it in the cluster as well (btw, I couldn't find where we add to ipfs either so any links would be helpful - unless that's not implemented yet in which case I can try jumping on it).

YurkoWasHere commented 2 years ago

If I understand correctly, the script finds objects that are pinned to local storage but not in the cluster and pins them in the cluster.

Yes, this essential syncs pins on ipfs into an ipfs-cluster. It started off as 3 lines and got turned into this crazy thing :) I was just looking at what is possible and how ipfs-cluster actaully works.

If it is to make sure that everything we add to ipfs is pinned in ipfs-cluster as well then I think replacing ipfs add with ipfs-cluster-ctl add should do the trick since the latter adds stuff to ipfs, pins it there and pins it in the cluster as well

This is my understanding as well.

(btw, I couldn't find where we add toipfs` either so any links would be helpful - unless that's not implemented yet in which case I can try jumping on it).

This function does not yet exist. Currently the only time we invoke IPFS is to generate the CID https://github.com/starlinglab/integrity-backend/blob/59061993d980980258143e6d0548f84fc0fcddb7/starlingcaptureapi/file_util.py#L244

For example https://github.com/starlinglab/integrity-backend/blob/dev/integritybackend/actions.py#L117

I would imagine the adding to IPFS would happen at the bottom of the action. https://github.com/starlinglab/integrity-backend/blob/dev/integritybackend/actions.py#L161

@benhylau ^^ is there another action your thinking of pinning or this good?

YurkoWasHere commented 2 years ago

~~It may be in Archive Action https://github.com/starlinglab/integrity-backend/issues/67~~

Never mind above is the archive action

galargh commented 2 years ago

TODO @galargh:

galargh commented 2 years ago

Axis to consider for the documentation:

- Geofencing
  - 1st level being US vs. EU (via IP address or KYC)
  - 2nd level being by country (via IP address or KYC)
- Whitelisting of specific storage providers
  - 1st level being that nodes have a persistent identity on the network, can be pseudonymous
  - 2nd level being we have contact information via some KYC process that we can “call a node operator” to sort issues
  - 3rd level being established orgs (like Internet Archive, etc.) are the only node operators (similar to how Lit imagines their validator network)
- Access
  - 1st level being data everywhere, encryption key is only protection
  - 2nd level being a static shared key to gate access of actual encrypted content (e.g. private ipfs swarm key)
  - 3rd level being a dynamically provisioned, ACL-based, or revokable key to gate access of content (so leak keys have remedy)
- Erasure
  - a programatic way to signal an erasure of content on the network
galargh commented 2 years ago

Because of holiday in-between I redirected my efforts a bit. I caught up on previous Slack convos with Storj and set up a repository for hosting documentation: the repository, the documentation site hosted with GitHub Pages.

Now that the form is ready, my plan is to get back to my TODO list.

galargh commented 2 years ago

Sync follow-up:

Because of holiday in-between I redirected my efforts a bit. I caught up on previous Slack convos with Storj and set up a repository for hosting documentation: the repository, the documentation site hosted with GitHub Pages.

I'll move my markdowns to wiki instead 🚀 and I'll work on them there 🥳


@benhylau I think as far as IPFS goes, all we need is to decide where exactly to do ipfs-cluster-ctl add (i.e. the answer to this https://github.com/starlinglab/integrity-backend/issues/86#issuecomment-1095725292) + make sure that we init ipfs and ipfs-cluster on the machines.