IPFS Feedback - Githubissues

jbenet commented 9 years ago

Hey all, would be great to get some feedback of the things you'd like IPFS to do better right now. This includes the go-ipfs repo, the ipfs protocol itself, the js webui, all our codebases, our community channels-- really, anything. There's of course a lot well known and planned-- i'm merely prioritizing where we spend our efforts in the coming weeks.

Some questions to get you thinking:

I'd love it if it were easier to ....
I wish that ...
I really hate when ...
It is very cumbersome to ...
... is very annoying
... is very broken

Thanks, Juan

PS: yep, a rough spec is incoming shortly for those itching to implement in other langs. feel free to signal your desire here.

Crosspost from https://github.com/jbenet/go-ipfs/issues/849

warpfork commented 9 years ago

Getting a distributed filesystem to take off in isolation on $n$ machines where 1 <= n <= ~3 seems like it should be a well-paved road. Lots of the current/previous generation of distributed storage is really rough at this (I'm looking at you Ceph -- great if you have a billion dollars in hardware already; a mess if I have three computers at home and want to sync something privately.) I think it could be done writing a patch to the locator service that's more like a hostfile instead of DHT-centric (I've been thinking about contributing this sometime if it's not already around :))

I have a lot of questions around how I'd be sure my data is stored forever if I myself have a rotating set of machines. Pinning seems like part of the short answer, but I wonder how that's going to hold up organizationally over deep time -- will I end up evolving another file somewhere that tracks all my important pins, so I can make sure to pin them all again when I break one computer and get another? (Not even worried about proof of storage in the case of bad actors yet; more just how I make sure I maintain all data even in my own data center given that over time all machines, and thus all their pin lists, can permanently drop out at any time.)

vitzli commented 9 years ago

I had some questions related to hashes before I joined #ipfs and I couldn't find answers in the documentation, eventually I found some, but I think it would nice if these questions were covered in the reference/white-paper/manual.

What is this thing: Qmd286K6pohQcTKYqnS1YhWrCiS4gz7Xi34sdwMe9USZ7u? What algorithm do you use?
Is it file's or directory's?
What are the reasons behind creating and using multihash?
Why do you use SHA2_256? I like MD5.
Is it possible to replace SHA2_256 when things will go bad?
Can I generate and use my own key-pair?

rdlugosz commented 9 years ago

One question I have seems like it'd be pretty common: Once I store something in IPFS, will it remain available to others once I (say) turn off my laptop? I'm under the impression that things I put in IPFS are copied to other nodes automatically... but perhaps that's not correct?

Other questions I have:

If I have placed something in IPFS, can I decide to delete it later and expect that deletion to propagate throughout the network?
If I'm operating a node, can I browse the files that my node is hosting?
Can I control the amount of storage/bandwidth is dedicated to IPFS?
That hashes are used to represent things seems to imply a level of anonymity in the system. Is anonymous storage/hosting a part of the concept of IPFS or is that not the case? Seems like this is something that should be spelled out and made very clear to users.

whyrusleeping commented 9 years ago

I'm going to answer these here, but move the questions and answers over to http://github.com/ipfs/faq afterwards.

ipfs provides no guarantees about data availability or redundancy. Data added to ipfs is self hosted until another interested party decides to request it from you and rehost.
If you add data to the network, and another node chooses to rehost, there is no way to cause them to delete it from their blockstore.
Yes, you can browse the blocks that your node is hosting. Files are an abstraction built on top of the merkledag. All data in ipfs is stored as content addressed blocks (so there are no filenames unless you have the directory block containing said file).
Currently, no, but these are features that are planned for the near future.
We make no claims about anonymity, the ipfs routing system makes it very easy to query the ip address of a peer hosting any given block. Down the road we plan on implementing a TOR-like routing system that may provide anonyminity.

ghost commented 9 years ago

I have been toying around with the idea of an adapter for archive.org/web/. That way, alongside keeping a local cache, you can just get the wayback machine to cache whats on your gateway, and then such an adapter would check the wayback machine to fill in the gaps for items that used to be cached, but arent any longer.

If this was hooked up to some global gateway (like I did with gateway.ipfs.io) then you get global caching for free.

for instance: http://web.archive.org/web/20150626212831/http://gateway.ipfs.io/ipfs/QmY7QNMLyxkQ628uExEPX8SvpZsXeXHnXvq5rCUJhb5XJG

gergo- commented 9 years ago

Just started playing around with IPFS today. I have a few points I'd like to raise/ask.

Since the original post in this thread asked what is very broken: The web UI is very broken for me. It runs, but it shows 0 peers, no files, no log entries, nothing. (I have plenty of peers shown by running ipfs swarm peers on the command line or getting the JSON as shown in the API docs.) This is on Linux, running IPFS from the binaries provided on the web page. I'm happy to open a bug report, but I'm not even sure which of the many projects it should be for.
Can I list my peers' blocks and peers? Would it be possible to build a crawler that way? Is there already a crawler?
@whyrusleeping mentioned above that it is possible to browse your blocks. I found my .ipfs/blocks directory, but how do I extract information from it?
Is the BitSwap credit system described in the paper already implemented? Is there a way to look at my credits?

whyrusleeping commented 9 years ago

@gergo- the webui is definitely broken, we had an issue with cross origin headers failing. you can fix it temporarily by setting the env var API_ORIGIN to * and running the daemon. There is an open PR to fix it thats on my todo list for today.

there is currently not a crawler, you can list your connected peers with ipfs swarm peers and you can list blocks on your local node with ipfs refs local. A crawler would theorhetically listen on the dht for provider messages and log them all in a DB somewhere.

Blocks in .ipfs/blocks are simply raw data. They will have protobuf framing and not be super readable on their own, but using ipfs object get will parse the protobuf stuff for you and print out the data and links of a given object.

Bitswap credit is not currently implemented.

eternaleye commented 8 years ago

Several things, most with concrete suggestions - largely focused on protocol-ish stuff.

I wish multihash either included the hash length in the hash computation (to do this securely requires prefix-freeness, like using LEB128 and putting the hashlen before the data) or didn't permit truncation at all. Silent hash truncation is always a security threat; see the SLOTH attacks. The same applies to including the hash algorithm identifier in the hash computation to foil substitution attacks; NUL-terminated strings are a good prefix-free representation for that.
I wish the routing layer had some foundation in provable assertions of possession, so that nodes couldn't falsely claim to hold data and querents could check the proof efficiently
- Proofs of Space-Time may help here: https://eprint.iacr.org/2016/035
- Could probably make a PoST a necessary condition for inserting a location into the DHT, and require fresh ones for refreshing the value.
- If so, saving that PoST and forwarding it to the querent would allow them to verify it as well.
I wish the exchange layer used something better than a simple block-trading protocol, such as applying FEC as described in the RaptorQP2P paper.
- The paper's hard to get a hold of; I have a copy but for copyright reasons am not posting it publicly. Feel free to email me though.
- EDIT: Presentation slides on the work found
- To summarize:
- The file as a whole is encoded with a fountain code at a large symbol size, and each symbol of that is encoded with another layer of it.
- Thus the people with the original files can just generate pseudorandomly-seeded fountain sequences of correction symbols and throw them at downloaders until they get the whole file.
- Peers can also trade these symbols back and forth in the absence of seeders.
- This achieves download times that are 50% of what BitTorrent manages, while being much more resilient to loss of seeders.
- There's a clever trick regarding which symbols are sent to which directly connected peers by taking the low bits of the symbol ID modulo the max upload slots, and only sending if the result is equal to the peer's slot ID, to avoid the same peer getting the same symbol from multiple places.
- One issue is that RaptorQ is patented by Qualcomm, and the only RAND license is for the RFC implementation - an implementation with parameters too limited to implement that paper well.
- However, the paper's techniques should apply equally well to other fountain code schemes, such as rateless polar codes, which seem technologically superior as well: http://arxiv.org/abs/1508.03112
I wish multiaddr used URIs, so as to interoperate with the vast set of tools that already exist and avoid reinventing the wheel, the axle, and the chassis to put them on.
- Failing that, I wish each multiaddr scheme was required to define a canonical mapping to URI form, with strong preference for existing schemes and URI design guidelines.
- For example, the URIs given in the multiaddr README would not be good; the URI spec already distinguishes IPv4 vs. IPv6 by dotted-quad vs. bracketed-colon-hex, and so putting '4' or '6' in the scheme is poor URI design
- Canonical mappings would do more to "heal the rift" than just flat paths, which create a further rift with all URI-based tools.
I wish multiaddr didn't specify a binary format that uses one byte to denote the top-level protocol. If IPFS is meant to be the permanent web, that kind of low, arbitrary ceiling just isn't going to work. Preferably a binary format would be left out entirely for now, because in no way shape or form will the size of multiaddr records be the bottleneck in IPFS any time soon.
I wish the DHT used S/Kademlia (http://www.tm.uka.de/doc/SKademlia_2007.pdf) rather than plain Kademlia, in order to avoid a wide variety of potential attacks.

jbenet commented 8 years ago

@eternaleye fantastic suggestions. yes! :+1: would you like to help making these happen? (some of this is already planned work, like the provide records including proofs + s/kad, etc. the raptorQ work is very interesting. i think that could be a different mode of bitswap. wonder how we could adapt something like that to whole dags, i.e. where the symbols are combinations of the objects themselves.)

mitra42 commented 5 years ago

Top items for me would be:

js-ipfs: Better error handling - i.e. getting some indication of when createReadStream is never going to return.
go: Fix the issues with garbage collection losing all the urlstore data

These two are why IPFS is turned off for most of our UI - since we lose data, and the js-ipfs calls just sit there waiting when they try and access blocks that IPFS has lost, because the code can't work around the absence of a file if it never gets any kind of error response. This gives a poor UX, because users sit there waiting for something that will never complete, and means we have to leave IPFS off except for files we know are small and can timeout at an application level.

and of course .... at a much larger scale:

Figure out how to handle routing of larger numbers of hashes, so that a IPFS hash can be truly universal, not just retrievable only by clients connected to the same server.

A good conversation with @whyrusleeping a couple of months ago has helped understand why its hard - but its still the main "large" thing on the wish-list.

This routing is the gating factor to things being distributed, until that moves up a notch, any browser on dweb.archive.org has to talk directly via WSS to our IPFS instance which doesn't make it distributed at all - and offers no advantages over just retrieving via HTTP.

momack2 commented 5 years ago

Thanks for the feedback! @alanshaw - anything in the works on error handling in js-ipfs?

For the other requests - the go-ipfs gc fixes are actually in the upcoming 4.19 release, and we also have work underway on better provider strategies to improve content routing (follow along here: https://github.com/ipfs/go-ipfs/issues/5774)

alanshaw commented 5 years ago

js-ipfs: Better error handling - i.e. getting some indication of when createReadStream is never going to return.

@mitra42 assuming you mean ipfs.catReadableStream? If it doesn't return, it means it's waiting for the content to be available. There's no "error" to handle but there's a few things we could do to help that aren't possible right now in JS IPFS:

Cancel the request - allow the request to be canceled by the user or otherwise
Timeout - if we don't find the content after 30s we could give up
Feedback on progress - provide an API whereby IPFS/libp2p can provide information about what it's doing for a given request e.g. checking local repo...checking connected peers...checking DHT. This would give users more information about what's happening and help them understand why things are taking a long time and reassure them that IPFS hasn't just crashed!

We're working on it! https://github.com/ipfs/interface-ipfs-core/issues/58

Cancelling the request is the first step. Timeout is just cancel after a certain period of time.

I've been playing with AbortController as a means of allowing this and I've not had any negative feedback yet 🤞. There's a couple of reasons for this - it's a native API in the browser and so should be familiar to frontend developers (it is used by fetch). I mentioned it on that same issue https://github.com/ipfs/interface-ipfs-core/issues/58#issuecomment-437809858

If we can pass an AbortController.signal to cancel a request we have a template for doing something similar with a per-request log to provide feedback on the status of any given request.

I aim to introduce this concept as we tackle the async await/iterators endeavour.

mitra42 commented 5 years ago

@alanshaw - I guess the thinking here is like BitTorrent, the user will wait as long as needed until someone comes online with the content. But that's not how most applications work. For example, I want to open a file and load an image, the questions is, at what point does IPFS give up and tell the application so it can get the content somewhere else (e.g. via HTTP) - this is how WebTorrent works and is why its streaming experience is (from our perspective) so much better than IPFS's.

I'm not sure how to do this well in IPFS, I can (and do) do a timeout, but this means that either a) the timeout is so long that it creates a painful user experience b) the timeout is short enough that any large file will fail. I think - but don't know IPFS internals enough to be sure - that it wants a timeout that applies to making progress ... if IPFS is finding blocks, or getting closer in the DHT, then its fine to wait, but if progress is not being made then it needs to notify the app.

Note ... its the combination of IPFS having both low reliability (material stored in IPFS may or may not be retrievable at some later date, with some different connectivity); no errors; and no fallback to reliable methods is what makes IPFS performance appear so poor when its used as the primary way of for example loading content in a website.

alanshaw commented 5 years ago

Yes I agree the timeout should be tied to inactivity.

mitra42 commented 5 years ago

Which means I think that timeout is not as you said: just a cancel after a period of time because the timer needs to eb started/restarted in the js-ipfs layer, not the App.

mitra42 commented 5 years ago

Did this ever get solved - so queries can give some kind of feedback/error on failure?

Stebalien commented 5 years ago

No but the issue is here: https://github.com/ipfs/go-ipfs/issues/5541. It's actually not all that difficult to solve, we just need a special Reader adapter that detects a stalled read. Curl actually has the same logic (it'll timeout after some time saying "no progress for N seconds".

mitra42 commented 5 years ago

Pity its taken so long, I've run into a couple of groups this year who told me this was one of the key reasons they chose not to use IPFS.

Stebalien commented 5 years ago

Patches welcome, we're just limited on time.

mitra42 commented 5 years ago

There's no way someone outside the IPFS core group could work on something like this - given that some issues relating to it have been open since 2016. Its why in the dweb.archive.org site, we have IPFS as the last priority fallback (after http) in dweb.archive.org, because we can never tell if it is succeeding or failing we can't use IPFS and fallback to http, I always ask people what decentralized technologies they are using, and why and two projects told me it was the primary reason they dropped IPFS - I believe both of them were js-ipfs (you pointed at a go-ipfs issue).

In js-ipfs the topic gets pointed at the old (2016) issue https://github.com/ipfs/interface-ipfs-core/issues/58 which is about "cancellable requests" which (without knowing the code) seems to be something different entirely.

Stebalien commented 5 years ago

An external contributor can, absolutely, fix that bug. There's just no way the core team can scale to meet the needs of every single user.

whyrusleeping commented 5 years ago

Also, since this is really just wanting a timeout for a javascript promise, you can implement it yourself pretty easily by taking the promise from js-ipfs, and racing it with a timeout: https://italonascimento.github.io/applying-a-timeout-to-your-promises/

mitra42 commented 5 years ago

Yep - that timeout solution is the obvious one, the first thing we tried (almost a year ago) and unfortunately wrong :-( The problem is that failure is very different for a small or a large file, if the timeout is reasonable when for example IPFS is downloading the icons, (maybe 1 second) then its going to fail on larger files. Its also wrong for streams, since streams return success immediately, failure is when they are unable to deliver data.

OR13 commented 5 years ago

Would really love to be able to cancel stalled reads / shutdown the ipfs http client correctly.

I use the promise timeout, but it does not help in jest.

Currently skipping unit tests that contain bad data, because there is no way to close ipfs properly once it starts trying to get a bad hash.

[EDIT]: overall, I have enjoyed working with IPFS for the past few years, and really appreciate all the work from jbenet, protocol labs, and the community.

ipfs / notes

IPFS Feedback #318