mwri / erlang-efuse

Erlang FUSE (Filesystem in Userspace) interface.
MIT License
12 stars 5 forks source link

Building against macFUSE and JIT-enabled OTP 25 potentially #2

Open tucnak opened 2 years ago

tucnak commented 2 years ago

I've tried building this on macOS (macFUSE -lfuse is on par witgh the Linux one) with OTP 25 (erts-13.0.2) but unfortunately ran into a bunch of issues, namely I've had to remove some links i.e.

diff --git a/c_src/Makefile b/c_src/Makefile
index 1e65fe2..198ff7e 100644
--- a/c_src/Makefile
+++ b/c_src/Makefile
@@ -1,10 +1,10 @@
 all: efuse

 efuse.o: efuse.c efuse_defs.h
-       gcc -c -Wall -std=gnu99 -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse -shared -g -Wall -fPIC -MMD  -I"/usr/lib/erlang/lib/erl_interface-3.10.1/include" -I"/usr/lib/erlang/erts-9.2/include" efuse.c -o efuse.o
+       gcc -c -Wall -std=gnu99 -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse -shared -g -Wall -fPIC -MMD efuse.c -o efuse.o

 efuse: efuse.o
-       gcc efuse.o  -lerl_interface -lei -pthread -lnsl -lfuse -lrt -ldl -o efuse
+       gcc efuse.o -pthread -lfuse -ldl -o efuse
        cp efuse ../priv/

This is something I've had to do because neither were available, and correct me if I'm wrong, libnsl is not available on darwin, I've only found this for amd64... Either way, having castrated a bunch of objects out of the resulting binary, I could successfully build efuse

/o/b/efuse$ make
rebar3 compile
===> Fetching rebar3_hex v7.0.2
===> Fetching hex_core v0.8.4
===> Fetching verl v1.1.1
===> Analyzing applications...
===> Compiling hex_core
===> Compiling verl
===> Compiling rebar3_hex
===> Verifying dependencies...
gcc -c -Wall -std=gnu99 -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse -shared -g -Wall -fPIC -MMD efuse.c -o efuse.o
clang: warning: argument unused during compilation: '-shared' [-Wunused-command-line-argument]
gcc efuse.o -pthread -lfuse -ldl -o efuse
cp efuse ../priv/
===> Analyzing applications...
===> Compiling efuse

However improbable it would seem, only one of the tests has failed... I don't know if it says more about my approach, or the quality of your tests but either way here it is what it is:

/o/b/efuse$ make test
mkdir -p deps
rm -f deps/efuse
ln -s .. deps/efuse
rebar3 ct
===> Fetching coveralls v2.2.0
===> Fetching jsx v2.10.0
===> Analyzing applications...
===> Compiling jsx
===> Compiling coveralls
===> Verifying dependencies...
make[1]: Nothing to be done for `all'.
===> Analyzing applications...
===> Compiling efuse
test/efuse_SUITE.erl:4:2: Warning: export_all flag enabled - all functions will be exported

===> Running Common Test suites...
=INFO REPORT==== 1-Aug-2022::14:40:22.471780 ===
    application: efuse
    exited: stopped
    type: temporary

Failed 1 tests. Passed 8 tests.
Results written to "/opt/badt/efuse/_build/test/logs/index.html".
===> Failures occurred running tests: 1
make: *** [test] Error 1

The result is not comprehensive to me, yet I will provide it nonetheless.

=== Test case: [efuse_SUITE:read_files/1](https://github.com/mwri/erlang-efuse/issues/efuse_suite.src.html#read_files-1) (click for source code)

=== Config value:

    [{mount_dir,"/tmp/erlang_ct_mount_efuse_erlfs"},
     {erlfs_cbmod,efuse_erlfs},
     {file_reads,[{"apps/efuse/descr",
                   "Erlang FUSE (Filesystem in Userspace)\n"}]},
     {watchdog,<0.685.0>},
     {tc_logfile,"/opt/badt/efuse/_build/test/logs/ct_run.nonode@nohost.2022-08-01_14.40.21/lib.efuse.logs/run.2022-08-01_14.40.21/efuse_suite.read_files.html"},
     {tc_group_properties,[{name,efuse_erlfs}]},
     {tc_group_path,[]},
     {data_dir,"/opt/badt/efuse/_build/test/lib/efuse/test/efuse_SUITE_data/"},
     {priv_dir,"/opt/badt/efuse/_build/test/logs/ct_run.nonode@nohost.2022-08-01_14.40.21/lib.efuse.logs/run.2022-08-01_14.40.21/log_private/"}]

=== Current directory is "/opt/badt/efuse/_build/test/logs/ct_run.nonode@nohost.2022-08-01_14.40.21"

=== Started at 2022-08-01 14:40:21

*** CT Error Notification 2022-08-01 14:40:21.956 ***[đź”—](https://github.com/mwri/erlang-efuse/issues/new#e-1)
efuse_SUITE:'-read_files/1-fun-0-' failed on line 74
Reason: {badmatch,"cat: /tmp/erlang_ct_mount_efuse_erlfs/apps/efuse/...}

[Full error description and stacktrace](https://github.com/mwri/erlang-efuse/issues/new#end)

=== Ended at 2022-08-01 14:40:21
=== Location: [{efuse_SUITE,'-read_files/1-fun-0-',[74](https://github.com/mwri/erlang-efuse/issues/efuse_suite.src.html#74)},
              {lists,foreach_1,1442},
              {efuse_SUITE,read_files,[72](https://github.com/mwri/erlang-efuse/issues/efuse_suite.src.html#72)},
              {test_server,ts_tc,1782},
              {test_server,run_test_case_eval1,1291},
              {test_server,run_test_case_eval,1223}]
=== === Reason: no match of right hand side value "cat: /tmp/erlang_ct_mount_efuse_erlfs/apps/efuse/descr: No such file or directory\n"
  in function  efuse_SUITE:'-read_files/1-fun-0-'/2 (/opt/badt/efuse/test/efuse_SUITE.erl, line 74)
  in call from lists:foreach_1/2 (lists.erl, line 1442)
  in call from efuse_SUITE:read_files/1 (/opt/badt/efuse/test/efuse_SUITE.erl, line 72)
  in call from test_server:ts_tc/3 (test_server.erl, line 1782)
  in call from test_server:run_test_case_eval1/6 (test_server.erl, line 1291)
  in call from test_server:run_test_case_eval/9 (test_server.erl, line 1223)

I really wonder if it would ever be possible to use FUSE/macFUSE in Elixir.

The reason why I'm going at such length, and why I believe it's worth the effort for you to help me out with this, is that I'm currently working on a pretty interesting project which is a FUSE-enabled supervision tree monitor tool akin to what observer does augmented with the state/scope debugging. Meaning: you would be able to run cat-grep/ripgrep-xargs pipelines on the actively running Elixir programs and extract runtime information in this way. You can think of it like as-if you had IO.inspect everywhere but you don't actually have to do this. Zero overhead unless grep begins to open() and read() various processes for information.

supmon /mountpoint
cd /mountpoint
ripgrep -l uuid | xargs ripgrep pattern
# will make a list of processes involved with `uuid` and—
# search for a `pattern` among these processes!
#
# nothing like this would normally be possible in Elixir, but with FUSE— very much so!
tucnak commented 2 years ago

My attempt to run a local userfs test against the so-called "patched" version of efuse culminated in a spectacular failure, that is none of the tests actually worked out:

  1) test getattr FS implementation represented to OS (noent error) (UserfsTest)
     test/userfs_test.exs:187
     Expected truthy, got false
     code: assert called(TestFs.userfs_getattr(:mock_state, "/e"))
     stacktrace:
       test/userfs_test.exs:194: (test)

WARNING: Deleting data for module 'Elixir.TestFs' imported from
["/opt/badt/userfs/Elixir.TestFs.17779.coverdata",
 "/opt/badt/userfs/Elixir.TestFs_meck_original.17779.coverdata"]

  2) test mount FS init is called (UserfsTest)
     test/userfs_test.exs:30
     ** (exit) exited in: GenServer.call(#PID<0.1086.0>, :stop, 5000)
         ** (EXIT) time out
     stacktrace:
       (elixir 1.13.4) lib/gen_server.ex:1030: GenServer.call/3
       (userfs 1.0.4) lib/userfs.ex:57: Userfs.umount/1
       test/userfs_test.exs:13: anonymous fn/0 in UserfsTest.__ex_unit_setup_0/1
       (ex_unit 1.13.4) lib/ex_unit/on_exit_handler.ex:143: ExUnit.OnExitHandler.exec_callback/1
       (ex_unit 1.13.4) lib/ex_unit/on_exit_handler.ex:129: ExUnit.OnExitHandler.on_exit_runner_loop/0

  3) test umount returns error for not mounted FS (UserfsTest)
     test/userfs_test.exs:63
     ** (ExUnit.TimeoutError) test timed out after 60000ms. You can change the timeout:

       1. per test by setting "@tag timeout: x" (accepts :infinity)
       2. per test module by setting "@moduletag timeout: x" (accepts :infinity)
       3. globally via "ExUnit.start(timeout: x)" configuration
       4. by running "mix test --timeout x" which sets timeout
       5. or by running "mix test --trace" which sets timeout to infinity
          (useful when using IEx.pry/0)

     where "x" is the timeout given as integer in milliseconds (defaults to 60_000).

     stacktrace:
       (elixir 1.13.4) lib/system.ex:1065: System.do_port/3
       (elixir 1.13.4) lib/system.ex:1055: System.do_cmd/3
       test/userfs_test.exs:46: UserfsTest.__ex_unit_setup_1/1
       test/userfs_test.exs:1: UserfsTest.__ex_unit__/2
       (ex_unit 1.13.4) lib/ex_unit/runner.ex:493: ExUnit.Runner.exec_test_setup/2
       (ex_unit 1.13.4) lib/ex_unit/runner.ex:452: anonymous fn/2 in ExUnit.Runner.spawn_test_monitor/4
       (stdlib 4.0.1) timer.erl:235: :timer.tc/1
       (ex_unit 1.13.4) lib/ex_unit/runner.ex:451: anonymous fn/4 in ExUnit.Runner.spawn_test_monitor/4

WARNING: Deleting data for module 'Elixir.TestFs' imported from
["/opt/badt/userfs/Elixir.TestFs.17779.coverdata",
 "/opt/badt/userfs/Elixir.TestFs_meck_original.17779.coverdata"]

  4) test list returns a list of mounted filesystems (UserfsTest)
     test/userfs_test.exs:80
     ** (exit) exited in: GenServer.call(#PID<0.1086.0>, :status, 5000)
         ** (EXIT) time out
     code: assert Userfs.list |> Enum.map(fn({pid,_}) -> pid end) |> Enum.member?(pid1)
     stacktrace:
       (elixir 1.13.4) lib/gen_server.ex:1030: GenServer.call/3
       (userfs 1.0.4) lib/userfs/server.ex:47: Userfs.Server.status/1
       (userfs 1.0.4) lib/userfs.ex:87: anonymous fn/1 in Userfs.list/0
       (elixir 1.13.4) lib/enum.ex:1593: Enum."-map/2-lists^map/1-0-"/2
       (userfs 1.0.4) lib/userfs.ex:83: Userfs.list/0
       test/userfs_test.exs:87: (test)

And here the humiliation was a bit too much to continue...

mwri commented 2 years ago

The tests no longer pass for me, patched or not (as noted in https://github.com/mwri/erlang-efuse/issues/1).

I think your change to c_src/Makefile is reasonable, I've not used it for a great number of years now, and clearly it needs some TLC. It looks like it still works though, I just built and ran it with your changes and everything still works as far as I can see.

mwri commented 2 years ago

Had a quick look. Seems what no longer works is efuse:umount. If you run umount from the shell, it's all good, but the umount call in the lib doesn't work, which breaks the unit tests when it comes to rmdir to clean up the mount point, and it is "busy".

tucnak commented 2 years ago

Oh, interesting. Are you going to be committing my change, and if not, would you be willing to accept a pull request into the matter?

Now, umount is important, could you perhaps provide some pointers as to why it fails?

I'm trying to use your other library upon which efuse is a dependency, userfs— to do a funny thing that seems to me an interesting avenue of research— namely, I wish to model supervision trees in run time along with the states as filesystem hierarchy so it could be manipulated via unix shell tools, such as cat and grep. The idea is to be able and run grep like you would normally do over source code, but in this code over program state:

# search for processes referencing "uuid" and among these processes _only_ find a certain pattern
grep -l uuid | xargs grep pattern

In this model, folder hierarchy would represent a supervision tree, and different files would correspond to various debugging outlets, K=V files for the most part. Thus I would be able to debug my live state without either starting the debugger, or leaving Vim!

What do you think? Would userfs be up to the task?

mwri commented 2 years ago

Of course, I'd be happy to accept PRs, no problem.

I don't know why umount fails, when you call userfs / efuse umount it hits the shell to run umount, so, it should be no different to running umount manually from the shell really... I know this is not nice, generally I'd do anything not to hit the shell like this, so I assume there was no other option available to me.

Since it works fine manually I kinda feel there really can't be a lot wrong... Maybe there's some sort of race condition though now with the sequence of events, running umount, the gen server process, the c interface process shutdown, etc.

Your purpose sounds great, it's super useful to have process state available via the filesystem sometimes isn't it? :)

tucnak commented 2 years ago

Interesting: I no longer get umount failures, but now the following two tests fail:

image
=== Location: [{efuse_SUITE,'-read_files/1-fun-0-',[74](https://github.com/mwri/erlang-efuse/issues/efuse_suite.src.html#74)},
              {lists,foreach_1,1442},
              {efuse_SUITE,read_files,[72](https://github.com/mwri/erlang-efuse/issues/efuse_suite.src.html#72)},
              {test_server,ts_tc,1782},
              {test_server,run_test_case_eval1,1291},
              {test_server,run_test_case_eval,1223}]
=== === Reason: no match of right hand side value "cat: /tmp/erlang_ct_mount_efuse_erlfs/apps/efuse/descr: No such file or directory\n"
  in function  efuse_SUITE:'-read_files/1-fun-0-'/2 (/opt/badt/efuse/test/efuse_SUITE.erl, line 74)
  in call from lists:foreach_1/2 (lists.erl, line 1442)
  in call from efuse_SUITE:read_files/1 (/opt/badt/efuse/test/efuse_SUITE.erl, line 72)
  in call from test_server:ts_tc/3 (test_server.erl, line 1782)
  in call from test_server:run_test_case_eval1/6 (test_server.erl, line 1291)
  in call from test_server:run_test_case_eval/9 (test_server.erl, line 1223)

=== Location: [{efuse_SUITE,'-read_files/1-fun-0-',[74](https://github.com/mwri/erlang-efuse/issues/efuse_suite.src.html#74)},
              {lists,foreach_1,1442},
              {efuse_SUITE,read_files,[72](https://github.com/mwri/erlang-efuse/issues/efuse_suite.src.html#72)},
              {test_server,ts_tc,1782},
              {test_server,run_test_case_eval1,1291},
              {test_server,run_test_case_eval,1223}]
=== === Reason: no match of right hand side value "cat: /tmp/erlang_ct_mount_efuse_examplefs/file1: No such file or directory\n"
  in function  efuse_SUITE:'-read_files/1-fun-0-'/2 (/opt/badt/efuse/test/efuse_SUITE.erl, line 74)
  in call from lists:foreach_1/2 (lists.erl, line 1442)
  in call from efuse_SUITE:read_files/1 (/opt/badt/efuse/test/efuse_SUITE.erl, line 72)
  in call from test_server:ts_tc/3 (test_server.erl, line 1782)
  in call from test_server:run_test_case_eval1/6 (test_server.erl, line 1291)
  in call from test_server:run_test_case_eval/9 (test_server.erl, line 1223)

Now, these are the most important calls... am I doing something wrong perhaps?

tucnak commented 2 years ago

Your purpose sounds great, it's super useful to have process state available via the filesystem sometimes isn't it? :)

Yeah, that's the thing— process state over the filesystem. I feel like we'd gone in a wrong direction with some of the profiler tools that are so common these days. And now having to learn Elixir/Erlang of all things at a new shop, I have to fall back to what I'm used to— the Unix tools, Vim, et cetera. Bonus points if I could add quickfix capability for run-time failures akin to how it interacts with the output of make/gcc for compile-time failures. I could use this as an opportunity to learn Elixir properly, so the sooner I can get over this hurdle in the form of efuse— Erlang binding that is preventing me from playing with it.

P.S. Sorry for bothering you with this, I know being a FOSS maintainer myself how annoying it can be to address these issues years on when you have moved on by then; it's just that I don't have anyone to ask for advice when it comes to Erlang so if you could point me in a direction of a mailing list, or forum— where schmuck like me can reach out for help— I would really, really appreciate it.

Fuse is an important piece of software, I feel like it's to everyone's benefit.

mwri commented 1 year ago

The no match of right hand side value "cat: /tmp/erlang_ct_mount_efuse_erlfs/apps/efuse/descr: No such file or directory\n" is straight forward to interpret, assuming line 74 is the same as my code, it is ExpectContent = os:cmd("cat "++MountDir++"/"++Filename). You can inspect ExpectContent to see what it is I guess, but whatever it is, the return value of os:cmd("cat "++MountDir++"/"++Filename) is expected to match, and the no match of right hand side value "cat: /tmp/erlang_ct_mount_efuse_erlfs/apps/efuse/descr: No such file or directory\n" error means that the right hand side was "cat: /tmp/erlang_ct_mount_efuse_erlfs/apps/efuse/descr: No such file or directory\n".

It's probably clear enough that whatever ExpectContent is, the result of the unix cat command was that error from the shell, so the /tmp/erlang_ct_mount_efuse_erlfs/apps/efuse/descr file didn't exist.

I imagine it must be because the mount failed or crashed fairly catastrophically, so it's not mounted. The mount above does say "Ok" though, but it could have mounted and then crashed I guess. I'd expect there to be something useful in the ct logs to be honest...