stevenbell / vhdlweb

Learn VHDL by example in a web browser (for Tufts ES 4 and EE 201)
GNU General Public License v3.0
6 stars 4 forks source link

"std.textio.all" security vulnerability #7

Closed nobodywasishere closed 3 years ago

nobodywasishere commented 4 years ago
...
use std.textio.all;

...

architecture synth of abc is

    file output_file : text open write_mode is "../../../test2/password";

begin
...

Running the above code in any problem will allow you to erase test2's password. From there you can login to their account without any problems, and so can anyone else.

With additional code and effort, I believe it's possible someone could read another user's password directly to the console, delete the evidence (in the form of their submission) and login without anyone knowing any better.

This relies on:

Also to note, building will fail and the server will produce an error if there are any files in the first level of a problem folder, i.e. [username]/[problem name]/[anything that isnt a number] will cause it to crash.

Limiting the libraries GHDL has access to and sanitizing the input should be enough to mitigate this unless another vulnerability is found. I am going to work on adding code sanitation and making it so the password field cannot be empty. I'm not sure if you want to do anything about the others.

stevenbell commented 4 years ago

Yeah, std.textio is a massive security hole. If you can read and write arbitrary files on the server, basically anything is possible (from stealing classmates' code to turning the server into a spam relay).

Possible mitigation options:

  1. Scan code submissions and remove (and report) any attempt to load std.textio
  2. Remove textio from the GHDL installation
  3. Sandbox the application (e.g., with Docker) so that you can read and write files in the sandbox but can't access the real system.
  4. Use Linux permissions to prevent the application from reading/writing other files (basically make a user that can just exe

Doing (1) is almost certainly part of the solution. I'm not sure that blocking std.textio wholesale is the right thing, since string printing is a reasonable thing to do in a testbench. But opening files should definitely be blocked.

(2) seems a bit heavy-handed. Being able to read a test vector file is really useful in a testbench (and some of the existing testbenches do this).

I think (3) is the "right way" to handle it, but I'm concerned about overhead. Right now the server response is in the millisecond range, and I wouldn't want it to take more than a second or two. I don't know how fast a Docker container can be spun up... I guess that's something to research.

Maybe (4) is a lightweight way to achieve sandboxing, or maybe there's another way. My basic thought is to create a dummy user, chown the submission files to be owned by the user, and then run. The dummy user will only be able to see normal system files and the submission files that have been specifically made available. chown requires root, though... I guess that's not better.

nobodywasishere commented 4 years ago

Going off of (1), we could put the text parsing in the Makefile via grep/sed, and vary it depending on what someone needs access to for that problem. IE for data/problems/abc/Makefile:

run:
    if grep "use std.textio" submission.vhd > /dev/null; then \
    echo "ERROR: Don't use std.textio"; \
    [insert command for reporting user]; \
    sed -i 's/use std.textio;//g' submission.vhd; fi
    @$(GHDL) -a $(FLAGS) testbench.vhd submission.vhd
    @$(GHDL) -e $(FLAGS) abc_test
    @$(GHDL) -r $(FLAGS) abc_test --stop-time=1us

Should delete any lines containing use std.textio before they're run by GHDL. A similar thing could be applied to if you need to input/output files, filter for instances of ../ and ~, so that they cannot leave the directory (this may not be perfect as there may be other env vars that would get them out of where they are). The code on the page isn't also update to reflect this change until after the person has refreshed. Another problem is it relies on every problem's Makefile being setup correctly, and could easily stop it from working if there's a mistake.

It is possible to print strings using assert [statement] report "[message]"; without needing std.textio.

For (2), maybe it's possible to block it by default and only include it when necessary? As in delete it from your standard GHDL installation and add it back using the -P option. (I actually did some tests and I don't think this is feasible without maintaining our own build of GHDL, and that'd be ugly)

For (3), what about a sandbox per user or problem instead of per attempt?

For (4), I think something like this would work. chown requires root, but chmod does not. Create a user named ghdlbot and dynamically give and remove permissions as they need them. Run all GHDL commands from this user using runuser.

A combination of (1) and (4) may be the simplest solution.

stevenbell commented 4 years ago

I spent some time today learning about Docker. I've used chroot a handful of times before, but as I thought about how to properly implement sandboxing with chroot, I think it'll be easier with the infrastructure Docker provides. On my laptop I could spin up a minimal image and run a command in about 1 second, so it should be reasonable to do in a server context.

Proper sandboxing would also help with other vulnerabilities -- things like writing huge files or creating forkbombs. The questions in my mind are:

nobodywasishere commented 4 years ago

things like writing huge files or creating forkbombs.

Yeah that'd be important. I didn't even think about those.

Does it help to create a Docker image per user/problem, or is it better to just throw it away after every run?

Personally, I don't see a problem with reusing the docker containers per user, but it comes down to how much space/resources they take up, and how fast they are to spin up.

How do we create a minimal image with GHDL? I think we can just create an image with Alpine and add Make and a GHDL binary. If we have to build all of GHDL inside the docker image, that will mean installing GNAT and other stuff which isn't needed for the task at hand.

It may be even simpler than that. GHDL provides docker containers that are all setup and ready to go. But I am not familiar with how docker works so I will have to do some research myself.

How fast does this actually run? If we're doing stuff in Docker... does it make sense to use Kubernetes or AWS or some other elastic compute tool? The server is sufficient right now, but in the future being able to scale dynamically would be nice.

I guess we'll have to do some experiments with some large projects to see how it handles them vs outside of the docker on the same hardware. Switching to Docker may be the thing that makes this easier to scale.

A lot of the questions I'm having right now are just on how docker works so I'm going to look into that more. Things like how we would pass stderr back to the user, and how we would move user files/Makefiles/other problem files in/out of the image.

stevenbell commented 4 years ago

GHDL provides docker containers that are all setup and ready to go.

Whoa, that's awesome. I did a really quick test by pulling one of the containers and running it:

steven@laptop$ time sudo docker run --mount type=bind,source=$(pwd)/abc,destination=/home ghdl/ghdl:ubuntu20-mcode make --directory /home gold
make: Entering directory '/home'
TEST PASSED.
make: Leaving directory '/home'

real    0m1.256s

There are lots of things wrong with this test (like using /home as a mount point for the code files), but it does prove the concept. The time to run isn't bad -- it's not instantaneous but it's definitely still interactive. And I'm sure there are ways to speed it up.

nobodywasishere commented 3 years ago

https://github.com/nobodywasishere/vhdlweb/tree/docker-ghdl

I managed to get docker working, and it's available to test at https://vhdlweb.eowyn.net/. I cannot perceive any difference in speed between this and calling GHDL directly, except whenever the docker image updates. This runs the latest version of ghdl/ghdl:buster-gcc-8.3.0 with the following command, essentially:

docker run --rm -v [wkdir]:[wkdir] [original make command]

--rm makes it so it deletes the container after using it. -v [wkdir]:[wkdir] copies the wkdir folder into the container from the local file system, and so that is all the command has access to. From here it doesn't matter what std.textio.all can do as:

I also added a section to the config file specifying the docker container to use for running these commands.

To get running, simply install docker:

apt install docker.io

Then enable the systemd component:

systemctl start docker
systemctl enable docker

Then start vhdlweb.

stevenbell commented 3 years ago

Oh, this is cool! I'll take a look at the PR on Friday or over the weekend, and see if we can get this merged in.

I did notice a little bit of lag (build takes ~1 second) but this isn't a big deal. The only potential problem is if this increases server load and we try stuff in class (not this semester, but if 80 students try to build within 60 seconds, that could be an issue). I say we deploy it and just monitor this semester.

How hard will it be to build our own Docker layer (presumably based on the GHDL one) to include netlistsvg?

nobodywasishere commented 3 years ago

The lag could also be from the server not being that powerful (its 1 core that's also serving a synapse and riot sites). I could try to add logging to count how many people are running stuff at once to monitor how much it can take.

I don't think it would be that big of a problem, I'd just have to figure out how to do it! It would include ghdl, yosys, and netlistsvg (and their dependencies). How the svg is passed to the end user, I'm not sure yet.

nobodywasishere commented 3 years ago

Fixed with #8, closing