ipfs-shipyard / py-ipfs-http-client

A python client library for the IPFS API
MIT License
682 stars 201 forks source link

Method for creating new directory object w/ set of links #220

Open fubuloubu opened 4 years ago

fubuloubu commented 4 years ago

Given a list or iterable of {"Hash": CID, "Name": str} objects, it should be possible to create a new directory object using object.put without having to manually create the directory object in bytes, query the size of links, or recursively add single links to directories one at a time (creating intermediate objects). Not sure if I'm missing an API for how to do this, but it would be pretty helpful.

ntninja commented 4 years ago

You should be able to do this like this:

import ipfshttpclient
with ipfshttpclient.connect() as client:
    hash = client.object.new("unixfs-dir")[0]
    for obj in objs:
        hash = client.object.patch.add_link(hash, obj["Name"], obj["Hash"])["Hash"]
print("Final directory hash:", hash)

Please close if this fixed your issue.

fubuloubu commented 4 years ago

I had a folder with 78k items in it, doing this iteratively with add_link seemed like a bad idea, but I suppose I could give it a try

ntninja commented 4 years ago

Yeah, it probably won't be hyper-fast. Unfortunately, there exists no HTTP API in IPFS to upload whole directories as lists of CIDs. (Since both .add and .tar.add [not implemented] requires one to include the file bodies to upload as well.) Maybe the relatively new .dag.import could be used but then we'd need to generate CAR (content-addressable archive) data.

If you need the extra performance, it's probably best to do what @SupraSummus did and just generate the relevant UnixFS protobuf datastructures yourself (using .block.put). They are not overly complicated.

We really need a Python IPLD library to make this kind of stuff easier though.

ntninja commented 4 years ago

The recently added .dag.put would allow for this more easily… Except that it doesn't expose the required format parameter in the implementation I just merged two days ago :facepalm: .

(See its upstream docs for why: https://ipfs.io/ipns/docs.ipfs.io/reference/http/api/#api-v0-dag-put)

ntninja commented 3 years ago

This should now be possible using .dag.put("dag-pb") on 0.7.0a1. In particular this (untested) snippet should do want you want:

import json
import ipfshttpclient
with ipfshttpclient.connect() as client:
    dag_links = [
        {"Name": obj["Name"], "Cid": {"/": obj["Hash"]}}
        for obj in objs
    ]

    hash = client.dag.put([json.dumps({
        "data":"CAE=",  # Honestly not sure what this one is, but you can just copy-paste it
        "links": dag_links
    })], format="dag-pb")
print("Final directory hash:", hash)

I'd be interested to know whether this actually works, so any feedback welcome there! Also note that you can add a cumulative size value to each file or directory by also setting the "Size" field in JSON. For making uploads faster, the input encoding can also be set to CBOR but that's not in the standard library.

(One caveat you may run into with this method is that it may not automatically shard large directories. So if you add a very big directory, it may fail or produce an unsharable result (I think 4MiB was the bitswap block size limit) – but again, this needs to be checked to be sure.)