Open ntninja opened 5 years ago
I've ported datastore to py3 (passed the existing test cases) at dheatovwil/datastore
@dheatovwil: I saw your message, but I didn't respond in text (the OP was updated through). Sorry for this! I've updated the OP to outline what I believe should be happening next in order for this to become useful in terms of py-ipfs
. In particular I've put up the need to make datastore
async next: While we don't really need this right-now, fixing this later would be a pain in the ** – so let's do it now while we're breaking stuff anyways.
It's also much easier than it may sound: Start at the filesystem implementation and think about every place were we're currently doing a system call that may block (such as open
, read
, write
, recv
, send
, stat
, fsync
, …) and replace each these with their respective async equivalent (https://pypi.org/project/aiofiles/ will be needed for this). The replacement functions inserted will now return coroutines however so they will need to be prefixed with the await
keyword and their surrounding function will have to be marked async
. This in turn will make those functions return coroutines as well, so you'll need to do the same thing with each function that calls them*. Once your done updating each function in this cascade as well as all unit tests (same story there), you're done.
@alexander255 I would like to work on the bitswap implementation.
Although, I would require constant help, guidance and advice and the progress may be quite slow.
Resources I know of:
Specs: https://github.com/ipfs/specs/tree/master/bitswap
Go implementation: https://github.com/ipfs/go-bitswap
JS implementation: https://github.com/ipfs/js-ipfs-bitswap
I would work using a bottom-up approach.
@alexander255
@AliabbasMerchant: Any help is appreciated! Particularly on that front!
And I'll say it up front: It's not going to be easy, I'll help were I can, but you'll may have to do some reverse-engineering of the source code and definitely is going to involve some guesswork – there is no close-to-final spec and it shows (so ideally document any and all findings in whatever form while you're at it).
(One important thing I also realized however, while writing this reply is that we do not actually have any implementation for transport security yet and hence all data will have to sent as plain text; at the current stage of development I don't believe this is problem however as it allows for better debugging.)
The first thing required will be establishing a libp2p
connection to a known peer on localhost
: There is an example available for this. First negotiate for the /plaintext/1.0.0
MSS protocol (that is: no encryption), then for /ipfs/bitswap/1.0.0
(I think!). After this (with some luck) you should have established a bitswap connection. go-ipfs
will however reject any attempt of establishing an unencrypted connect unless you start it as ipfs daemon --disable-transport-encryption
, so be aware.
Using this connection then, try sending a block request Protobuf message (see also the protobuf
library), requesting a block you know exists in the remote server and dump any response packets you may receive.
It probably won't work exactly the way I described, but should be close. Please don't hesitate to ask any questions you may have and I'll try to answer them to the best of my abilities. :slightly_smiling_face:
Sure.
Looks like a perfect task for me!
I will try my best to document everything that I find.
I will work here: https://github.com/AliabbasMerchant/py-ipfs-bitswap
@alexander255
I noticed, IPFS has numerous repos for JS. Even some small codes (no doubt, important ones) have their own repo.
(For example: https://github.com/ipfs/js-ipfs-block-service, https://github.com/ipfs/js-ipfs-unixfs, https://github.com/ipfs/js-ipfs-block, https://github.com/ipfs/js-datastore-fs)
Do we want to do the same for Python, or should we put them all in this repo???
I need the python versions of some of the above for py-ipfs-bitswap. So I am making them. Should I make new repos for them?
Also, 1 more thing.
What is the exact purpose of py-ipfs?
We all know python is slow. So we are definitely not trying to replace the go and js versions.
So, what is the exact goal??
If we know the goal, we can write code and documentation accordingly.
@AliabbasMerchant: Don't read to much into it, it's not uncommon in JS that every subroutine ends up in a different package. It's not uncommon to have packages such as is-buffer
, that then look like this (and I'm quoting a real module here):
module.exports = function isBuffer (obj) {
return obj != null && obj.constructor != null &&
typeof obj.constructor.isBuffer === 'function' && obj.constructor.isBuffer(obj)
}
(The end.) Apparently JS people like it that way and everybody does it there, so it's not surprisingly that JS-IPFS would do it to.
In Python it's more common that packages are written for groups of related functionality instead so packages are bigger but more versatile.
TL;DR: Never mind that JS has subpackages for everything, in Python everything Bitswap and up can be one package (py-ipfs).
BTW: datastore-fs
already exists as part of https://github.com/dheatovwil/datastore but benefit from being made async (no need to rewrite it from scratch).
Also, 1 more thing. What is the exact purpose of py-ipfs? We all know python is slow. So we are definitely not trying to replace the go and js versions. So, what is the exact goal?? If we know the goal, we can write code and documentation accordingly.
Python support will be very important. It's one of the most popular scripting languages after JavaScript and used by numerous projects. To name just a few from my area of knowledge: The Blender 3D animation software and Godot game engine both use Python (named GDScript for the later) for addons and development. With a native Python implementation of IPFS, the daemon could be included in such software allowing it to directly work with files within the IPFS network... this is just one huge advantage I can immediately point out.
@AliabbasMerchant: Thank you for bringing up this important subject! I'll try to answer from my perspective, but do remember that different people have different visions of what this software should actually be. The discussions of #40 and #1 are good illustrations of this I think.
My vision for Py-IPFS: A client-oriented Python library for accessing data from the IPFS network and caching it locally, that should be easy for any Python application to embed and ship. Basically a quick way to access (and by extension share to) the IPFS network from Pure-Python without running a full-blown daemon. As for speed: Similar to Py-ETH the main goal of Py-IPFS should be readability and ease of auditing, not so much raw through-output. I would expect things to be pretty fast anyways when run on PyPy with some minor optimizing done (CPython will always be slow, but it's not the only Python implementation thankfully), but it's not the main goal.
I'm interested in your, and others!, vision as well however.
I have started with python bitswap.
Still under development, but please check it out and give your valuable feedback here: https://github.com/AliabbasMerchant/py-ipfs-bitswap/issues/1
/cc @alexander255
Out of sheer morbid curiousity, has anyone thought about re-thinking this library as a light wrapper around the C++, Rust, or Go implementations instead? (Using C API, PyO3, or gopy)
Python is a great systems integration language, but a pure Python implementation seems like it'd be very slow and lagging behind the other compiled implementations with more funding.
Also, I am aware of the HTTP client library... just seemed like a direct integration with Python bindings might be safer with less overhead than communicating over HTTP. I've not had a great experience with the HTTP client either.
Is this still happening or is it completely abandoned?
Since #1 is clogged with all the many comments I open a new issue here. Feel free to continue the discussion below and I'll keep the following updated as things develop. Also feel free to create separate issues / repos to coordinate and I'll add the relevant links below.
Next steps (networking, stalled – please see the “storage” section below):
py-libp2p
librarypy-ipfs-bitswap
library: https://github.com/AliabbasMerchant/py-ipfs-bitswapipfs block *
API for fetching blocks of nodes we are connected to – fetching blocks of non-connected nodes needs the DHT.ipfs daemon --disable-transport-encryption
, but note that you will not be able to connect to any regular peers until one of the transport encryption methods is implementedmultistream-select
code ofpy-libp2p
to support actually dialing other nodes (MOSTLY FIXED UPSTREAM –ls
is still missing and anmss-nc
implementation could still be useful)mss-nc
like utility on top of this code to demonstrate that you are able to connect togo-ipfs
nodes and negotiatels
mode in which MSS will return a list of supported protocols, see https://github.com/multiformats/multistream-select/blob/master/README.md for the complete speclibp2p
py-multistream-select
library and updatepy-libp2p
to use it (Easy!, stalled – needs your help!)py-multistream-select
library: https://github.com/dheatovwil/py-multistream-selectNext steps (storage, simpler):
(Suggestion: Use Python's
lib2to3
and just drop Python 2 entirely.)Current port: https://github.com/dheatovwil/datastore(maybe https://pypi.org/project/aiofiles/ ?) for file accessThetrio
framework is used for async I/O nowpy-ipfs
“implementation” that can fetch blocks from the local$IPFS_PATH
directory and expose them with an API similar to what https://github.com/ipfs/py-ipfs-http-client currently offers (goal here is to eventually have a drop-in replacement)block/{get,put,rm,stat}
API that serves blocks from the local$IPFS_PATH
directorytrio-quart
ASGI web microframework for this. (Whatever you choose it will have to be compatible with trio as that is the AIO framework used in the stack.)dgraph-io/badger#984