Open iustin opened 10 years ago
Hmm, sorry I inadvertently made that harder. A --no-download
flag seems worthwhile. Is it reasonable to have some flag to get hoogle
to do the bootstrap, that way I can maintain the "thing that downloads everything" script? Note that I usually develop Hoogle on a train with no internet access, which means I locally hack my copy so I download everything in advance and then use it. It sounds like you just want that feature done properly.
I’m a bit out of the loop: But what exactly do you need to download?
Assume you have a directory of hoogle.txt
files of all relevant packages. Is that sufficient information?
The list of files is here: https://github.com/ndmitchell/hoogle/blob/master/src/Recipe/All.hs#L178 - so basically all the .txt
files, all the .cabal
files for them, plus the keywords file and Platform cabal description.
Ok, let’s see
.txt
files. We include them in the respective -doc
packages, so they are available locally. ✓Depending on what you need them for we either skip them in Debian, or we have to start adding them to the -doc
packages, along the .txt
files.
I use the platform Cabal file to make a package named "platform" which is searched by default. I suspect the Debian people might want to search things the user has installed by default. I use the .cabal files to find what depends on what, so I can link up alias information properly. Having the .cabal file in the -doc package seems reasonable, since it does contain lots of documentation about the package.
Aside from the actual list of files needed (Joachim knows the situation better there), I want to say that having the --no-download
flag would be very useful, and if you do cleanup the local feature that is also good.
@nomeata: we not only can include the keywords file, we already do - so yes, I do hope the license is good! I don't know what's the refresh policy for that though, something to keep in mind (and add to instructions).
@nomeata: we not only can include the keywords file, we already do - so yes, I do hope the license is good! I don't know what's the refresh policy for that though, something to keep in mind (and add to instructions).
The simplest policy would be if upstream would include the file in the tarball as well (with the possibility for the user to update it, as it is the case now), and we just update it along new upstream versions of hoogle :-)
@nomeata I'm happy to include the keywords file - that seems reasonable.
The automatic downloading of untrusted files was just reported as a critical security bug in the Debian bugtracker: http://bugs.debian.org/756334
What is the timeline for a version of hoogle with --no-download
? I’d avoid having to patch the package in Debian to bridge the time.
Also, the worries (unpacking untrusted data) apply to your users as well. Maybe you want to take precautions (signatures or such, or at least warnings to the user)?
The worst that could happen is someone maliciously replacing the Hoogle search results, and that seems unlikely, so I am not too worried. The --no-download
is on my radar, but not fantastically close. Maybe a month?
Unless I misunderstand what the option should do (just prevent downloads if a file is missing) I'll send a patch later this week.
As long as the option doesn't do anything harmful if not passed, I'll happily accept anything that works for you guys (but might then revisit it in the future).
Current behavior leads to privilege elevation in Debian, as follows:
See here for some other potential attacks.
@abacabadabacaba Is there some flag I can pass to strip the setuid files? I can't see the link at the moment since it gives a 500 server error.
I don't know any such flags. Actually, tar is pretty hard to use securely.
One possible solution is to extract into a directory inaccessible to other users, which means that its parent directory should have permission bits set to 0700. An archive can modify permission bits on the directory it is extracted into, that's why it is important to make the parent directory inaccessible to other users.
Then, it is necessary to sanitize the resulting directory by removing everything except regular files and directories. That's just find -type f -o -type d -o -delete
. Otherwise, an attacker can place a symlink to cause your script to read arbitrary files. An attacker can also place device nodes and named pipes.
Running tar as a non-root user would prevent it from creating setuid root files and device nodes. However, it can still create symlinks, which can be used to access some secret data. For example, an archive may contain a symlink to /etc/ppp/chap-secrets
, which could be used to leak contents of that file.
An alternative approach is to use some library to access contents of the archive without extracting it at all, e. g. libtar.
After this, some denial of service attacks still remain. For example, an attacker can send a "gzip bomb", which will fill the filesystem when unpacked, or just a very large file.
While I appreciate @abacabadabacaba concern about security, I feel it is a bit out of place here, in the context of this bug.
Feel free to clone it and make it a general bug about the safety of downloaded data, but I'd prefer if this bug remains purely about using hoogle without internet access.
I also feel it's a bit misplaced that we assign all of the above attacks on hoogle; normally hoogle is not used as root, it's the fault of the Debian packaging that it runs it so, so a lot of the described attacks above are no longer valid (only root can create device nodes, only files to which hoogle has access would be leaked - if at all, etc.).
Most of the attacks are still valid, they would just compromise the account hoogle is running as, which is still a vulnerability. Well, it is really a different bug than that hoogle requires Internet access, so I created issue #78 for it.
With the --no-download
flag in, I think the only issue remaining on this bug is how to declare/discover needed files.
It can stay as it is now (looking at the source code), but a nicer way would be to have the data
command show the missing files and their URLs either in the error message of missing files or via a separate command. Thoughts?
Could hoogle download
just download the necessary files? Then you could do that to build the package?
Yes, if you provide that it would be fine.
@iustin, @ndmitchell: What is the status of this? Who is blocked on whom?
Nobody is blocked, but I'm traveling right now and don't have access to my GPG keys to upload a new version. This issue remains open for explicit --download
, but we're fine with the current coffee for the open bug.
Hi from the Debian Haskell Packaging group!
In Debian, we'd like to ship Hoogle in such a way that it doesn't use online resources: the package should ship with enough data such that it's possible to generate the local databases "offline" (from bootstrap + local packages).
This used to work via some local hacks (see http://anonscm.debian.org/cgi-bin/darcsweb.cgi?r=pkg-haskell/haskell-hoogle;a=headblob;f=/files_hoogle/update-hoogle) until recently, but (if I read the changes correctly) issue #47 changed the way it looks for local data.
Fixing our script is doable, but I'm worried that things might break again, and silently (like this time; we only found that newer hoogle downloads data from the internet by accident), so I'm wondering what are your thoughts about improving hoogle's bootstrap mode?
A couple of potential improvements that come to mind:
Thanks in advance for your feedback!