Open jbenet opened 9 years ago
Getting a distributed filesystem to take off in isolation on $n$ machines where 1 <= n <= ~3 seems like it should be a well-paved road. Lots of the current/previous generation of distributed storage is really rough at this (I'm looking at you Ceph -- great if you have a billion dollars in hardware already; a mess if I have three computers at home and want to sync something privately.) I think it could be done writing a patch to the locator service that's more like a hostfile instead of DHT-centric (I've been thinking about contributing this sometime if it's not already around :))
I have a lot of questions around how I'd be sure my data is stored forever if I myself have a rotating set of machines. Pinning seems like part of the short answer, but I wonder how that's going to hold up organizationally over deep time -- will I end up evolving another file somewhere that tracks all my important pins, so I can make sure to pin them all again when I break one computer and get another? (Not even worried about proof of storage in the case of bad actors yet; more just how I make sure I maintain all data even in my own data center given that over time all machines, and thus all their pin lists, can permanently drop out at any time.)
I had some questions related to hashes before I joined #ipfs and I couldn't find answers in the documentation, eventually I found some, but I think it would nice if these questions were covered in the reference/white-paper/manual.
Qmd286K6pohQcTKYqnS1YhWrCiS4gz7Xi34sdwMe9USZ7u
? What algorithm do you use?One question I have seems like it'd be pretty common: Once I store something in IPFS, will it remain available to others once I (say) turn off my laptop? I'm under the impression that things I put in IPFS are copied to other nodes automatically... but perhaps that's not correct?
Other questions I have:
I'm going to answer these here, but move the questions and answers over to http://github.com/ipfs/faq afterwards.
I have been toying around with the idea of an adapter for archive.org/web/. That way, alongside keeping a local cache, you can just get the wayback machine to cache whats on your gateway, and then such an adapter would check the wayback machine to fill in the gaps for items that used to be cached, but arent any longer.
If this was hooked up to some global gateway (like I did with gateway.ipfs.io) then you get global caching for free.
for instance: http://web.archive.org/web/20150626212831/http://gateway.ipfs.io/ipfs/QmY7QNMLyxkQ628uExEPX8SvpZsXeXHnXvq5rCUJhb5XJG
Just started playing around with IPFS today. I have a few points I'd like to raise/ask.
ipfs swarm peers
on the command line or getting the JSON as shown in the API docs.) This is on Linux, running IPFS from the binaries provided on the web page. I'm happy to open a bug report, but I'm not even sure which of the many projects it should be for.@gergo- the webui is definitely broken, we had an issue with cross origin headers failing. you can fix it temporarily by setting the env var API_ORIGIN
to *
and running the daemon. There is an open PR to fix it thats on my todo list for today.
there is currently not a crawler, you can list your connected peers with ipfs swarm peers
and you can list blocks on your local node with ipfs refs local
. A crawler would theorhetically listen on the dht for provider messages and log them all in a DB somewhere.
Blocks in .ipfs/blocks
are simply raw data. They will have protobuf framing and not be super readable on their own, but using ipfs object get
will parse the protobuf stuff for you and print out the data and links of a given object.
Bitswap credit is not currently implemented.
Several things, most with concrete suggestions - largely focused on protocol-ish stuff.
@eternaleye fantastic suggestions. yes! :+1: would you like to help making these happen? (some of this is already planned work, like the provide records including proofs + s/kad, etc. the raptorQ work is very interesting. i think that could be a different mode of bitswap. wonder how we could adapt something like that to whole dags, i.e. where the symbols are combinations of the objects themselves.)
Top items for me would be:
These two are why IPFS is turned off for most of our UI - since we lose data, and the js-ipfs calls just sit there waiting when they try and access blocks that IPFS has lost, because the code can't work around the absence of a file if it never gets any kind of error response. This gives a poor UX, because users sit there waiting for something that will never complete, and means we have to leave IPFS off except for files we know are small and can timeout at an application level.
and of course .... at a much larger scale:
A good conversation with @whyrusleeping a couple of months ago has helped understand why its hard - but its still the main "large" thing on the wish-list.
This routing is the gating factor to things being distributed, until that moves up a notch, any browser on dweb.archive.org has to talk directly via WSS to our IPFS instance which doesn't make it distributed at all - and offers no advantages over just retrieving via HTTP.
Thanks for the feedback! @alanshaw - anything in the works on error handling in js-ipfs?
For the other requests - the go-ipfs gc fixes are actually in the upcoming 4.19 release, and we also have work underway on better provider strategies to improve content routing (follow along here: https://github.com/ipfs/go-ipfs/issues/5774)
js-ipfs: Better error handling - i.e. getting some indication of when createReadStream is never going to return.
@mitra42 assuming you mean ipfs.catReadableStream
? If it doesn't return, it means it's waiting for the content to be available. There's no "error" to handle but there's a few things we could do to help that aren't possible right now in JS IPFS:
We're working on it! https://github.com/ipfs/interface-ipfs-core/issues/58
Cancelling the request is the first step. Timeout is just cancel after a certain period of time.
I've been playing with AbortController
as a means of allowing this and I've not had any negative feedback yet 🤞. There's a couple of reasons for this - it's a native API in the browser and so should be familiar to frontend developers (it is used by fetch
). I mentioned it on that same issue https://github.com/ipfs/interface-ipfs-core/issues/58#issuecomment-437809858
If we can pass an AbortController.signal
to cancel a request we have a template for doing something similar with a per-request log to provide feedback on the status of any given request.
I aim to introduce this concept as we tackle the async await/iterators endeavour.
@alanshaw - I guess the thinking here is like BitTorrent, the user will wait as long as needed until someone comes online with the content. But that's not how most applications work. For example, I want to open a file and load an image, the questions is, at what point does IPFS give up and tell the application so it can get the content somewhere else (e.g. via HTTP) - this is how WebTorrent works and is why its streaming experience is (from our perspective) so much better than IPFS's.
I'm not sure how to do this well in IPFS, I can (and do) do a timeout, but this means that either a) the timeout is so long that it creates a painful user experience b) the timeout is short enough that any large file will fail. I think - but don't know IPFS internals enough to be sure - that it wants a timeout that applies to making progress ... if IPFS is finding blocks, or getting closer in the DHT, then its fine to wait, but if progress is not being made then it needs to notify the app.
Note ... its the combination of IPFS having both low reliability (material stored in IPFS may or may not be retrievable at some later date, with some different connectivity); no errors; and no fallback to reliable methods is what makes IPFS performance appear so poor when its used as the primary way of for example loading content in a website.
Yes I agree the timeout should be tied to inactivity.
Which means I think that timeout is not as you said: just a cancel after a period of time
because the timer needs to eb started/restarted in the js-ipfs layer, not the App.
Did this ever get solved - so queries can give some kind of feedback/error on failure?
No but the issue is here: https://github.com/ipfs/go-ipfs/issues/5541. It's actually not all that difficult to solve, we just need a special Reader
adapter that detects a stalled read. Curl actually has the same logic (it'll timeout after some time saying "no progress for N seconds".
Pity its taken so long, I've run into a couple of groups this year who told me this was one of the key reasons they chose not to use IPFS.
Patches welcome, we're just limited on time.
There's no way someone outside the IPFS core group could work on something like this - given that some issues relating to it have been open since 2016. Its why in the dweb.archive.org site, we have IPFS as the last priority fallback (after http) in dweb.archive.org, because we can never tell if it is succeeding or failing we can't use IPFS and fallback to http, I always ask people what decentralized technologies they are using, and why and two projects told me it was the primary reason they dropped IPFS - I believe both of them were js-ipfs (you pointed at a go-ipfs issue).
In js-ipfs the topic gets pointed at the old (2016) issue https://github.com/ipfs/interface-ipfs-core/issues/58 which is about "cancellable requests" which (without knowing the code) seems to be something different entirely.
An external contributor can, absolutely, fix that bug. There's just no way the core team can scale to meet the needs of every single user.
Also, since this is really just wanting a timeout for a javascript promise, you can implement it yourself pretty easily by taking the promise from js-ipfs, and racing it with a timeout: https://italonascimento.github.io/applying-a-timeout-to-your-promises/
Yep - that timeout solution is the obvious one, the first thing we tried (almost a year ago) and unfortunately wrong :-( The problem is that failure is very different for a small or a large file, if the timeout is reasonable when for example IPFS is downloading the icons, (maybe 1 second) then its going to fail on larger files. Its also wrong for streams, since streams return success immediately, failure is when they are unable to deliver data.
Would really love to be able to cancel stalled reads / shutdown the ipfs http client correctly.
I use the promise timeout, but it does not help in jest.
Currently skipping unit tests that contain bad data, because there is no way to close ipfs properly once it starts trying to get a bad hash.
[EDIT]: overall, I have enjoyed working with IPFS for the past few years, and really appreciate all the work from jbenet, protocol labs, and the community.
Hey all, would be great to get some feedback of the things you'd like IPFS to do better right now. This includes the go-ipfs repo, the ipfs protocol itself, the js webui, all our codebases, our community channels-- really, anything. There's of course a lot well known and planned-- i'm merely prioritizing where we spend our efforts in the coming weeks.
Some questions to get you thinking:
Thanks, Juan
PS: yep, a rough spec is incoming shortly for those itching to implement in other langs. feel free to signal your desire here.
Crosspost from https://github.com/jbenet/go-ipfs/issues/849