ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
402 stars 31 forks source link

IPFS Feedback #318

Open jbenet opened 9 years ago

jbenet commented 9 years ago

Hey all, would be great to get some feedback of the things you'd like IPFS to do better right now. This includes the go-ipfs repo, the ipfs protocol itself, the js webui, all our codebases, our community channels-- really, anything. There's of course a lot well known and planned-- i'm merely prioritizing where we spend our efforts in the coming weeks.

Some questions to get you thinking:

Thanks, Juan

PS: yep, a rough spec is incoming shortly for those itching to implement in other langs. feel free to signal your desire here.

Crosspost from https://github.com/jbenet/go-ipfs/issues/849

warpfork commented 9 years ago

Getting a distributed filesystem to take off in isolation on $n$ machines where 1 <= n <= ~3 seems like it should be a well-paved road. Lots of the current/previous generation of distributed storage is really rough at this (I'm looking at you Ceph -- great if you have a billion dollars in hardware already; a mess if I have three computers at home and want to sync something privately.) I think it could be done writing a patch to the locator service that's more like a hostfile instead of DHT-centric (I've been thinking about contributing this sometime if it's not already around :))

I have a lot of questions around how I'd be sure my data is stored forever if I myself have a rotating set of machines. Pinning seems like part of the short answer, but I wonder how that's going to hold up organizationally over deep time -- will I end up evolving another file somewhere that tracks all my important pins, so I can make sure to pin them all again when I break one computer and get another? (Not even worried about proof of storage in the case of bad actors yet; more just how I make sure I maintain all data even in my own data center given that over time all machines, and thus all their pin lists, can permanently drop out at any time.)

vitzli commented 9 years ago

I had some questions related to hashes before I joined #ipfs and I couldn't find answers in the documentation, eventually I found some, but I think it would nice if these questions were covered in the reference/white-paper/manual.

  1. What is this thing: Qmd286K6pohQcTKYqnS1YhWrCiS4gz7Xi34sdwMe9USZ7u? What algorithm do you use?
  2. Is it file's or directory's?
  3. What are the reasons behind creating and using multihash?
  4. Why do you use SHA2_256? I like MD5.
  5. Is it possible to replace SHA2_256 when things will go bad?
  6. Can I generate and use my own key-pair?
rdlugosz commented 9 years ago

One question I have seems like it'd be pretty common: Once I store something in IPFS, will it remain available to others once I (say) turn off my laptop? I'm under the impression that things I put in IPFS are copied to other nodes automatically... but perhaps that's not correct?

Other questions I have:

  1. If I have placed something in IPFS, can I decide to delete it later and expect that deletion to propagate throughout the network?
  2. If I'm operating a node, can I browse the files that my node is hosting?
  3. Can I control the amount of storage/bandwidth is dedicated to IPFS?
  4. That hashes are used to represent things seems to imply a level of anonymity in the system. Is anonymous storage/hosting a part of the concept of IPFS or is that not the case? Seems like this is something that should be spelled out and made very clear to users.
whyrusleeping commented 9 years ago

I'm going to answer these here, but move the questions and answers over to http://github.com/ipfs/faq afterwards.

  1. ipfs provides no guarantees about data availability or redundancy. Data added to ipfs is self hosted until another interested party decides to request it from you and rehost.
  2. If you add data to the network, and another node chooses to rehost, there is no way to cause them to delete it from their blockstore.
  3. Yes, you can browse the blocks that your node is hosting. Files are an abstraction built on top of the merkledag. All data in ipfs is stored as content addressed blocks (so there are no filenames unless you have the directory block containing said file).
  4. Currently, no, but these are features that are planned for the near future.
  5. We make no claims about anonymity, the ipfs routing system makes it very easy to query the ip address of a peer hosting any given block. Down the road we plan on implementing a TOR-like routing system that may provide anonyminity.
ghost commented 9 years ago

I have been toying around with the idea of an adapter for archive.org/web/. That way, alongside keeping a local cache, you can just get the wayback machine to cache whats on your gateway, and then such an adapter would check the wayback machine to fill in the gaps for items that used to be cached, but arent any longer.

If this was hooked up to some global gateway (like I did with gateway.ipfs.io) then you get global caching for free.

for instance: http://web.archive.org/web/20150626212831/http://gateway.ipfs.io/ipfs/QmY7QNMLyxkQ628uExEPX8SvpZsXeXHnXvq5rCUJhb5XJG

gergo- commented 9 years ago

Just started playing around with IPFS today. I have a few points I'd like to raise/ask.

whyrusleeping commented 9 years ago

@gergo- the webui is definitely broken, we had an issue with cross origin headers failing. you can fix it temporarily by setting the env var API_ORIGIN to * and running the daemon. There is an open PR to fix it thats on my todo list for today.

there is currently not a crawler, you can list your connected peers with ipfs swarm peers and you can list blocks on your local node with ipfs refs local. A crawler would theorhetically listen on the dht for provider messages and log them all in a DB somewhere.

Blocks in .ipfs/blocks are simply raw data. They will have protobuf framing and not be super readable on their own, but using ipfs object get will parse the protobuf stuff for you and print out the data and links of a given object.

Bitswap credit is not currently implemented.

eternaleye commented 8 years ago

Several things, most with concrete suggestions - largely focused on protocol-ish stuff.

jbenet commented 8 years ago

@eternaleye fantastic suggestions. yes! :+1: would you like to help making these happen? (some of this is already planned work, like the provide records including proofs + s/kad, etc. the raptorQ work is very interesting. i think that could be a different mode of bitswap. wonder how we could adapt something like that to whole dags, i.e. where the symbols are combinations of the objects themselves.)

mitra42 commented 5 years ago

Top items for me would be:

These two are why IPFS is turned off for most of our UI - since we lose data, and the js-ipfs calls just sit there waiting when they try and access blocks that IPFS has lost, because the code can't work around the absence of a file if it never gets any kind of error response. This gives a poor UX, because users sit there waiting for something that will never complete, and means we have to leave IPFS off except for files we know are small and can timeout at an application level.

and of course .... at a much larger scale:

A good conversation with @whyrusleeping a couple of months ago has helped understand why its hard - but its still the main "large" thing on the wish-list.

This routing is the gating factor to things being distributed, until that moves up a notch, any browser on dweb.archive.org has to talk directly via WSS to our IPFS instance which doesn't make it distributed at all - and offers no advantages over just retrieving via HTTP.

momack2 commented 5 years ago

Thanks for the feedback! @alanshaw - anything in the works on error handling in js-ipfs?

For the other requests - the go-ipfs gc fixes are actually in the upcoming 4.19 release, and we also have work underway on better provider strategies to improve content routing (follow along here: https://github.com/ipfs/go-ipfs/issues/5774)

alanshaw commented 5 years ago

js-ipfs: Better error handling - i.e. getting some indication of when createReadStream is never going to return.

@mitra42 assuming you mean ipfs.catReadableStream? If it doesn't return, it means it's waiting for the content to be available. There's no "error" to handle but there's a few things we could do to help that aren't possible right now in JS IPFS:

  1. Cancel the request - allow the request to be canceled by the user or otherwise
  2. Timeout - if we don't find the content after 30s we could give up
  3. Feedback on progress - provide an API whereby IPFS/libp2p can provide information about what it's doing for a given request e.g. checking local repo...checking connected peers...checking DHT. This would give users more information about what's happening and help them understand why things are taking a long time and reassure them that IPFS hasn't just crashed!

We're working on it! https://github.com/ipfs/interface-ipfs-core/issues/58

Cancelling the request is the first step. Timeout is just cancel after a certain period of time.

I've been playing with AbortController as a means of allowing this and I've not had any negative feedback yet 🤞. There's a couple of reasons for this - it's a native API in the browser and so should be familiar to frontend developers (it is used by fetch). I mentioned it on that same issue https://github.com/ipfs/interface-ipfs-core/issues/58#issuecomment-437809858

If we can pass an AbortController.signal to cancel a request we have a template for doing something similar with a per-request log to provide feedback on the status of any given request.

I aim to introduce this concept as we tackle the async await/iterators endeavour.

mitra42 commented 5 years ago

@alanshaw - I guess the thinking here is like BitTorrent, the user will wait as long as needed until someone comes online with the content. But that's not how most applications work. For example, I want to open a file and load an image, the questions is, at what point does IPFS give up and tell the application so it can get the content somewhere else (e.g. via HTTP) - this is how WebTorrent works and is why its streaming experience is (from our perspective) so much better than IPFS's.

I'm not sure how to do this well in IPFS, I can (and do) do a timeout, but this means that either a) the timeout is so long that it creates a painful user experience b) the timeout is short enough that any large file will fail. I think - but don't know IPFS internals enough to be sure - that it wants a timeout that applies to making progress ... if IPFS is finding blocks, or getting closer in the DHT, then its fine to wait, but if progress is not being made then it needs to notify the app.

Note ... its the combination of IPFS having both low reliability (material stored in IPFS may or may not be retrievable at some later date, with some different connectivity); no errors; and no fallback to reliable methods is what makes IPFS performance appear so poor when its used as the primary way of for example loading content in a website.

alanshaw commented 5 years ago

Yes I agree the timeout should be tied to inactivity.

mitra42 commented 5 years ago

Which means I think that timeout is not as you said: just a cancel after a period of time because the timer needs to eb started/restarted in the js-ipfs layer, not the App.

mitra42 commented 5 years ago

Did this ever get solved - so queries can give some kind of feedback/error on failure?

Stebalien commented 5 years ago

No but the issue is here: https://github.com/ipfs/go-ipfs/issues/5541. It's actually not all that difficult to solve, we just need a special Reader adapter that detects a stalled read. Curl actually has the same logic (it'll timeout after some time saying "no progress for N seconds".

mitra42 commented 5 years ago

Pity its taken so long, I've run into a couple of groups this year who told me this was one of the key reasons they chose not to use IPFS.

Stebalien commented 5 years ago

Patches welcome, we're just limited on time.

mitra42 commented 5 years ago

There's no way someone outside the IPFS core group could work on something like this - given that some issues relating to it have been open since 2016. Its why in the dweb.archive.org site, we have IPFS as the last priority fallback (after http) in dweb.archive.org, because we can never tell if it is succeeding or failing we can't use IPFS and fallback to http, I always ask people what decentralized technologies they are using, and why and two projects told me it was the primary reason they dropped IPFS - I believe both of them were js-ipfs (you pointed at a go-ipfs issue).

In js-ipfs the topic gets pointed at the old (2016) issue https://github.com/ipfs/interface-ipfs-core/issues/58 which is about "cancellable requests" which (without knowing the code) seems to be something different entirely.

Stebalien commented 5 years ago

An external contributor can, absolutely, fix that bug. There's just no way the core team can scale to meet the needs of every single user.

whyrusleeping commented 5 years ago

Also, since this is really just wanting a timeout for a javascript promise, you can implement it yourself pretty easily by taking the promise from js-ipfs, and racing it with a timeout: https://italonascimento.github.io/applying-a-timeout-to-your-promises/

mitra42 commented 5 years ago

Yep - that timeout solution is the obvious one, the first thing we tried (almost a year ago) and unfortunately wrong :-( The problem is that failure is very different for a small or a large file, if the timeout is reasonable when for example IPFS is downloading the icons, (maybe 1 second) then its going to fail on larger files. Its also wrong for streams, since streams return success immediately, failure is when they are unable to deliver data.

OR13 commented 5 years ago

Would really love to be able to cancel stalled reads / shutdown the ipfs http client correctly.

I use the promise timeout, but it does not help in jest.

Currently skipping unit tests that contain bad data, because there is no way to close ipfs properly once it starts trying to get a bad hash.

[EDIT]: overall, I have enjoyed working with IPFS for the past few years, and really appreciate all the work from jbenet, protocol labs, and the community.