Open mib-kd743naq opened 7 years ago
We should definitely be removing the duplicate entries.
As for the security implications, this definitely isn't a good bug to have but would probably be hard to exploit. The (really) worrying part is that you didn't get a timely response; I've opened up an internal discussion to figure out what happened there.
Note: That part of the IPLD spec is only tangentially related. IPFS directories have a very different structure.
We should definitely be removing the duplicate entries.
Only if there are guarantees within all readers ( http, cat, fuse ) that the first encountered "named link" is the one that wins. Otherwise the problem will only get worse.
Stepping back a bit - I was really hoping for actual marking of all duplicates or semi-duplicates as "Hey! Something is fishy". The thing I constructed is a simple bit-for-bit duplication. However, when you consider all the nastiness within https://github.com/ipfs/go-ipfs/issues/1710, just hiding an offender is likely insufficient, as it will miss:
x00
byte ( especially bad for FUSE )<CR>
byte, that can be rigged to make ipfs ls
output look... differentThe (really) worrying part is that you didn't get a timely response
It's growing pains within the team, it is understandable. If anything - this is a relatively low profile issue, so having the report-loop tested with that is way better than some sort of RCE ;)
The
byte, that can be rigged to make ipfs ls output look... different
This sounds like terminal control characters might go through as well. Maybe worth a try. A program that prints non-binary to stdout should never print special control characters, partly because of this, but also because it's sort-of an injection problem if stdout is piped to stdin of another program that expects a line separated "protocol". I usually never print user input without repr()
or {:?}
because of this. Did you test for newline?
Regarding duplicate entries: I think the \x00
byte might cause a truncation (depending on if there are null byte checks before go calls open(2)). Duplicate entries have been exploited on zip for android, I don't think I can recall all details, but the problem was that the first entry wins if you access the zip file with the library, but the last entry wins if you write it to disk, since the 2nd entry would just overwrite the first. For directories, it's probably going to merge them.
@kpcyrd I didn't even think to test this, but yes ( example outputs color codes only, is safe to ls
): https://ipfs.io/ipfs/QmZcYTLQNo1hJtGxQD8d9sHtmnBBamhDHydtqJ2zxzLSx7
~$ ipfs ls QmZcYTLQNo1hJtGxQD8d9sHtmnBBamhDHydtqJ2zxzLSx7
QmeHXsYCtWa9KhC5ZME5pn97c1zTt7TgK37VgKJWWAdknn 9
===========
RED ALERT
===========
~$ ipfs ls QmZcYTLQNo1hJtGxQD8d9sHtmnBBamhDHydtqJ2zxzLSx7 | hexdump -C
00000000 51 6d 65 48 58 73 59 43 74 57 61 39 4b 68 43 35 |QmeHXsYCtWa9KhC5|
00000010 5a 4d 45 35 70 6e 39 37 63 31 7a 54 74 37 54 67 |ZME5pn97c1zTt7Tg|
00000020 4b 33 37 56 67 4b 4a 57 57 41 64 6b 6e 6e 20 39 |K37VgKJWWAdknn 9|
00000030 20 0a 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 0a 1b 5b | .===========..[|
00000040 39 37 6d 1b 5b 35 6d 1b 5b 34 31 6d 1b 5b 35 6d |97m.[5m.[41m.[5m|
00000050 52 45 44 20 41 4c 45 52 54 1b 5b 35 6d 1b 5b 30 |RED ALERT.[5m.[0|
00000060 6d 1b 5b 35 6d 0a 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d |m.[5m.==========|
00000070 3d 0a |=.|
00000072
@Stebalien All the problems with non-printables aside - there is also a length-based DoS: http://ipfs.io/ipfs/QmVtFwE9CWfBk7WC2pCgPtMQsZ7zF6tgjbL5Vcpte7pLcy
Trying to follow the longer-named links results in either a 414 ( decent ) or 502 ( ugh ) on the gateway. On the CLI things are no better:
~$ ipfs ls QmVtFwE9CWfBk7WC2pCgPtMQsZ7zF6tgjbL5Vcpte7pLcy
Qme57pNu4VvdX5yZdVacxcr4VyxqtgoymeKGPzRmmm121Q 26 This string can be very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very long
QmcMeKrhJXNcyExPtAYKUfbJpn6WGweR7KbyV6CRMbD3Rm 29 This subdir is *really* large - do not list its contents/
~$ ipfs ls 'QmVtFwE9CWfBk7WC2pCgPtMQsZ7zF6tgjbL5Vcpte7pLcy/This subdir is *really* large - do not list its contents/I mean really - this will dump 1mb of nyancat on your screen...' | wc -c
1114145
Only if there are guarantees within all readers ( http, cat, fuse ) that the first encountered "named link" is the one that wins. Otherwise the problem will only get worse.
We'd do this at the lowest level so yes, it would cover all cases. I'd like to just reject such directories outright but directory sharding makes that impossible.
Stepping back a bit - I was really hoping for actual marking of all duplicates or semi-duplicates as "Hey! Something is fishy". The thing I constructed is a simple bit-for-bit duplication.
I'd like to handle this stuff at a layer below the UI. Users should (almost) never be asked to deal with security related things because they'll generally just be confused.
The thing I constructed is a simple bit-for-bit duplication. However, when you consider all the nastiness within #1710, just hiding an offender is likely insufficient, as it will miss:
So, we initially wanted Unix compatibility but, as you can see in that thread, sentiments have changed. Limiting unixfs paths to printable, UTF-8 byte sequences is reasonable.
there is also a length-based DoS
For long paths, we've considered limiting the length of individual components in a path (not entire paths, just components). However, we haven't picked a good limit. Currently, there's a hard limit of slightly less than 1MiB (due to other constraints) but we almost definitely want something shorter (4096?). Gateways can choose to reject any paths they want and we should probably set a "paths longer than X may not work, avoid them" limit.
As for large directories, that's something we can't fix. We want to, e.g., support dumping Wikipedia into IPFS (and every article on Wikipedia lives in /wiki
). However, we do need to stream directories when listing them instead of blocking (I can't find the issue but I'm pretty sure there is one about this).
If you're feeling adventurous, want to read through #1710 and take a stab at a reasonable path restriction proposal?
If you're feeling adventurous, want to read through #1710
I have read it, multiple times: I was the one who linked it into the thread ;)
take a stab at a reasonable pathrestriction proposal?
I currently can't commit to that, due to severe time shortage ( sorry :/ ). I will only make a rather off-the-cuff remark about something I think has not been explicitly explored:
It seems that the question of "what fundamentally is the UnixFS portion of IPFS, and what naming rules apply" remains unresolved ( that's really a question for @jbenet and friends )
It could be seen as a new POSIX-ish Filesystem: in this case it should shed as much backwards compatilibity load and define ( as mostly outlined in https://github.com/ipfs/go-ipfs/issues/1710 ):
.
, ..
come to mind )[\p{Letter}\p{Number}\p{Symbol}]
or it can be seen as a "VFS layer"-like system, which in essence must be POSIX, and allow for everything that Libarchive has to deal with
A different way of asking the above question is:
An explicit answer to this will inform where the rest of this ticket goes. Furthermore this answer must be provided and codified sooner rather than later, due to the accelerating growth of 3rd party implementations
@mib-kd743naq my question is: it it bad that this can happen? and how does for example ls
command work with it. Unix tooling is a great resource as they had to figure it out long ago and had a lot of time to think about it, it is still a resource and we might want to make some changes.
my question is: it it bad that this can happen?
@Kubuxu the problem is that I do not know :/
Intuitively it is a rather nasty situation when "input arriving from the internet" is able to manipulate your cursor position and essentially redraw the entirety of the terminal. There is something about IPFS being directly connected to the net that makes the scenario less palatable than say a .tar.gz
with the same content being tar -ztf
-ed ( though my version of tar
handles this perfectly, see below ) .
Not having a clear and well thought through answer to "is this a problem", is why I sidestepped this question entirely, and instead put together the "which one of the two" writeup above. It seems clearer from that standpoint that IPFS can't be both "fully POSIX compliant" and "an upgraded protocol with less cruft".
For me personally - "Fully POSIX compliant" is the answer, with the ( significant ) work needed to make the tooling behave better. But I really want to defer this to @jbenet in part because of his proven visionary track record, and in part because he has extensively thought through this already ( or so I hope ).
how does for example ls command work with it
Way better than I expected:
/dev/shm$ ls fooo/
?[97m?[5m?[41m?[5mRED ALERT?[5m?[0m?[5m?
/dev/shm$ tar -tf fooo.tar
fooo/
fooo/\033[97m\033[5m\033[41m\033[5mRED ALERT\033[5m\033[0m\033[5m\n
As we are working on https://github.com/ipfs/ipld-unixfs/issues/1, if the decision is to allow all link names then best part forward (I think) would be to also to spec out way of escaping them for the user.
It seems clearer from that standpoint that IPFS can't be both "fully POSIX compliant" and "an upgraded protocol with less cruft".
I disagree with this statement, what is stored (and can be accessed through the API) versus what is presented to the user are two things that can be different. As most of shells are not about to handle binary strings (as variables) it makes sense to have encoding in presenting them.
@Kubuxu in light of
Currently, IPLD specifies that all map/object key names must be Unicode text ( via https://github.com/ipfs/ipld-unixfs/issues/3#issuecomment-341289386 )
does my statement about "IPFS/IPLD can't be both at the same time" make more sense?
Version information:
Any
Type:
Enhancement
( arguablyBug
)Severity:
Medium
Description:
When faced with a duplicate name in a directory, the UI does not signal in any way that something is amiss. There should be some sort of indication ( a red exclamation mark or something ) to visually alert the user "hey, watch out". I am not able to build FUSE locally, so I do not know how things behave there: likely needs to be investigated as well.
Here is an example DAG: https://ipfs.io/ipfs/zdjA8bKXBUttGJ7aXtG3DeuHYtAVXNj1jA5sKuscTJ1nk1xL9
This issue is tangentially related to https://github.com/ipfs/go-ipfs/issues/1710. There is also a somewhat explicit policy in the IPLD spec, but if the UI doesn't show anything about it - this is still a problem.
P.S. In case this is deemed a security issue that should not be public - feel free to delete this bugreport. I tried to use the
security@
contact two times, but never received a response, so decided to file here before I move on.