ipfs-shipyard / py-ipfs

python implementation of ipfs
http://ipfs.github.io/py-ipfs/
MIT License
478 stars 95 forks source link

Next Steps #49

Open ntninja opened 5 years ago

ntninja commented 5 years ago

Since #1 is clogged with all the many comments I open a new issue here. Feel free to continue the discussion below and I'll keep the following updated as things develop. Also feel free to create separate issues / repos to coordinate and I'll add the relevant links below.

Next steps (networking, stalled – please see the “storage” section below):

Next steps (storage, simpler):

  1. [x] Port https://github.com/ipfs/py-datastore to Python 3
    (Suggestion: Use Python's lib2to3 and just drop Python 2 entirely.)
  2. [x] Convert datastore to use async/await using some library (maybe https://pypi.org/project/aiofiles/ ?) for file access The trio framework is used for async I/O now
  3. [ ] Implement a https://github.com/ipfs/go-ds-flatfs compatible backend for the above library
  4. [ ] Write a minimal py-ipfs “implementation” that can fetch blocks from the local $IPFS_PATH directory and expose them with an API similar to what https://github.com/ipfs/py-ipfs-http-client currently offers (goal here is to eventually have a drop-in replacement)
    • In progress by @alexander255 (no public code yet, most work happens in py-datastore)
  5. [ ] Implement a simple Python HTTP server that emulates the block/{get,put,rm,stat} API that serves blocks from the local $IPFS_PATH directory
  6. [ ] (Stretch goal) Implement a badgerds compatible backend for py-datastore
    • There is an issue requesting Python bindings for the Go library, but no work has been done yet:
      dgraph-io/badger#984
  7. [ ] Beyond: Start integrating IPLD to expose the UnixFS files stored in those raw blocks…
dheatovwil commented 5 years ago

I've ported datastore to py3 (passed the existing test cases) at dheatovwil/datastore

ntninja commented 5 years ago

@dheatovwil: I saw your message, but I didn't respond in text (the OP was updated through). Sorry for this! I've updated the OP to outline what I believe should be happening next in order for this to become useful in terms of py-ipfs. In particular I've put up the need to make datastore async next: While we don't really need this right-now, fixing this later would be a pain in the ** – so let's do it now while we're breaking stuff anyways. It's also much easier than it may sound: Start at the filesystem implementation and think about every place were we're currently doing a system call that may block (such as open, read, write, recv, send, stat, fsync, …) and replace each these with their respective async equivalent (https://pypi.org/project/aiofiles/ will be needed for this). The replacement functions inserted will now return coroutines however so they will need to be prefixed with the await keyword and their surrounding function will have to be marked async. This in turn will make those functions return coroutines as well, so you'll need to do the same thing with each function that calls them*. Once your done updating each function in this cascade as well as all unit tests (same story there), you're done.

AliabbasMerchant commented 5 years ago

@alexander255 I would like to work on the bitswap implementation.
Although, I would require constant help, guidance and advice and the progress may be quite slow.

Resources I know of: Specs: https://github.com/ipfs/specs/tree/master/bitswap
Go implementation: https://github.com/ipfs/go-bitswap
JS implementation: https://github.com/ipfs/js-ipfs-bitswap

I would work using a bottom-up approach.

@alexander255

ntninja commented 5 years ago

@AliabbasMerchant: Any help is appreciated! Particularly on that front!

And I'll say it up front: It's not going to be easy, I'll help were I can, but you'll may have to do some reverse-engineering of the source code and definitely is going to involve some guesswork – there is no close-to-final spec and it shows (so ideally document any and all findings in whatever form while you're at it).

(One important thing I also realized however, while writing this reply is that we do not actually have any implementation for transport security yet and hence all data will have to sent as plain text; at the current stage of development I don't believe this is problem however as it allows for better debugging.)

The first thing required will be establishing a libp2p connection to a known peer on localhost: There is an example available for this. First negotiate for the /plaintext/1.0.0 MSS protocol (that is: no encryption), then for /ipfs/bitswap/1.0.0 (I think!). After this (with some luck) you should have established a bitswap connection. go-ipfs will however reject any attempt of establishing an unencrypted connect unless you start it as ipfs daemon --disable-transport-encryption, so be aware. Using this connection then, try sending a block request Protobuf message (see also the protobuf library), requesting a block you know exists in the remote server and dump any response packets you may receive.

It probably won't work exactly the way I described, but should be close. Please don't hesitate to ask any questions you may have and I'll try to answer them to the best of my abilities. :slightly_smiling_face:

AliabbasMerchant commented 5 years ago

Sure.
Looks like a perfect task for me!
I will try my best to document everything that I find.
I will work here: https://github.com/AliabbasMerchant/py-ipfs-bitswap

AliabbasMerchant commented 5 years ago

@alexander255
I noticed, IPFS has numerous repos for JS. Even some small codes (no doubt, important ones) have their own repo.
(For example: https://github.com/ipfs/js-ipfs-block-service, https://github.com/ipfs/js-ipfs-unixfs, https://github.com/ipfs/js-ipfs-block, https://github.com/ipfs/js-datastore-fs)
Do we want to do the same for Python, or should we put them all in this repo???

I need the python versions of some of the above for py-ipfs-bitswap. So I am making them. Should I make new repos for them?

AliabbasMerchant commented 5 years ago

Also, 1 more thing.
What is the exact purpose of py-ipfs?
We all know python is slow. So we are definitely not trying to replace the go and js versions.
So, what is the exact goal??
If we know the goal, we can write code and documentation accordingly.

ntninja commented 5 years ago

@AliabbasMerchant: Don't read to much into it, it's not uncommon in JS that every subroutine ends up in a different package. It's not uncommon to have packages such as is-buffer, that then look like this (and I'm quoting a real module here):

module.exports = function isBuffer (obj) {
  return obj != null && obj.constructor != null &&
    typeof obj.constructor.isBuffer === 'function' && obj.constructor.isBuffer(obj)
}

(The end.) Apparently JS people like it that way and everybody does it there, so it's not surprisingly that JS-IPFS would do it to.

In Python it's more common that packages are written for groups of related functionality instead so packages are bigger but more versatile.

TL;DR: Never mind that JS has subpackages for everything, in Python everything Bitswap and up can be one package (py-ipfs).

BTW: datastore-fs already exists as part of https://github.com/dheatovwil/datastore but benefit from being made async (no need to rewrite it from scratch).

MirceaKitsune commented 5 years ago

Also, 1 more thing. What is the exact purpose of py-ipfs? We all know python is slow. So we are definitely not trying to replace the go and js versions. So, what is the exact goal?? If we know the goal, we can write code and documentation accordingly.

Python support will be very important. It's one of the most popular scripting languages after JavaScript and used by numerous projects. To name just a few from my area of knowledge: The Blender 3D animation software and Godot game engine both use Python (named GDScript for the later) for addons and development. With a native Python implementation of IPFS, the daemon could be included in such software allowing it to directly work with files within the IPFS network... this is just one huge advantage I can immediately point out.

ntninja commented 5 years ago

@AliabbasMerchant: Thank you for bringing up this important subject! I'll try to answer from my perspective, but do remember that different people have different visions of what this software should actually be. The discussions of #40 and #1 are good illustrations of this I think.

My vision for Py-IPFS: A client-oriented Python library for accessing data from the IPFS network and caching it locally, that should be easy for any Python application to embed and ship. Basically a quick way to access (and by extension share to) the IPFS network from Pure-Python without running a full-blown daemon. As for speed: Similar to Py-ETH the main goal of Py-IPFS should be readability and ease of auditing, not so much raw through-output. I would expect things to be pretty fast anyways when run on PyPy with some minor optimizing done (CPython will always be slow, but it's not the only Python implementation thankfully), but it's not the main goal.

I'm interested in your, and others!, vision as well however.

AliabbasMerchant commented 5 years ago

I have started with python bitswap.
Still under development, but please check it out and give your valuable feedback here: https://github.com/AliabbasMerchant/py-ipfs-bitswap/issues/1

/cc @alexander255

fubuloubu commented 3 years ago

Out of sheer morbid curiousity, has anyone thought about re-thinking this library as a light wrapper around the C++, Rust, or Go implementations instead? (Using C API, PyO3, or gopy)

Python is a great systems integration language, but a pure Python implementation seems like it'd be very slow and lagging behind the other compiled implementations with more funding.


Also, I am aware of the HTTP client library... just seemed like a direct integration with Python bindings might be safer with less overhead than communicating over HTTP. I've not had a great experience with the HTTP client either.

Rishiry commented 1 year ago

Is this still happening or is it completely abandoned?