mxmlnkn / ratarmount

Access large archives as a filesystem efficiently, e.g., TAR, RAR, ZIP, GZ, BZ2, XZ, ZSTD archives
MIT License
846 stars 37 forks source link

Dependency fusepy #101

Closed RubenKelevra closed 2 months ago

RubenKelevra commented 1 year ago

fusepy is currently a dependency of ratarmount. I was wondering if a switch to e.g. regular fuse-python is possible or planned?

fusepy has not been touched for years now, and there are still plenty of issues raised in https://github.com/fusepy/fusepy/issues/134

mxmlnkn commented 1 year ago

Yes, especially FUSE 3 support would be desirable. But because the API changes too much it seemed difficult to port the complexity of ratarmount to it especially because the API was not file path based anymore, I think? I didn't take a really hard look at it yet.

I am aware though that fusepy is practically dead but unfortunately the two or so forks also died out fast. And the last time I looked, fuse-python also didn't look very alive either. Furthermore, it does not support FUSE 3. So, I do not see the benefit of porting from one dead API to another.

Do you have any FUSE 3 binding recommendations? I think I would have looked into pyfuse3 instead but I think it has the aforementioned problems making a port hard.

pyfuse3 still has this disclaimer:

pyfuse3 is no longer actively developed and just receiving community-contributed maintenance to keep it alive for some time.

Also regarding pyfuse3:

It's a bit difficult because:

  • seems there are issues with pyfuse3 on some BSD
  • on macOS, (AFAIK), there is still no fuse3 support

And another reason against pyfuse3: It has no wheels. Trying to install it, will yield an error if libfuse3-dev is not installed. This makes it hard to put it into the requirements. While fusepy installs fine even with no fuse shared libraries being installed on the system, this is still more preferable as I can roll out my own much more readable error message in that case. Furthermore, the shared libraries are much more common to already be installed than the development package.

The situation with FUSE on Python is really desperate I think ...

mxmlnkn commented 1 year ago

I took another look at pyfuse3. It is indeed difficult to port ratarmount to it because it is a fork of llfuse and according to its previous name wraps only the low-level FUSE interface that works with inodes instead of paths. I think this is hard to get to work with ratarmount especially when it comes to such features as merging filesystems or mounting recursively.

I did a small survey on other projects on PyPI:

Project How it uses FUSE
kaitai_fs fusepy
gitfs fusepy 3.01
datalad depends on fsspec[fuse], which requires fusepy
YubiKey PIV FileSystem fusepy
ninfs fusepy. Does not list it in dependencies because of missing Windows support of fusepy. Instead, fuse.py has been copied into this repository.
GitFS Contains hardcopy of fuse.py with fusepy copyright header
DedupSQLfs llfuse
borg Works with both llfuse and pyfuse3?
swh.fuse pyfuse3
tgmount pyfuse3
tgcloud Uses yum python-fuse, so *fuse-python
flumes-fuse fuse-python 1.0.4
s3monkey Has the FUSE tag but because Heroku does not allow it, it has a Python interface, so basically like ratarmountcore.
fatx Manually interfaces with the FUSE C headers without a Python abstraction
aliyundrive-fuse and pikpak-fuse Rust package that is distributed as a Python module. Looks nice and interfaces directly with the FUSE C API and needs no Python bindings in between! Not entirely sure where the entry point for the Python package is, only main for a command line interface?
fox-it/dissect fusepy3, fork of fusepy with libfuse3 support. Last update 9 months ago.

These are not handpicked, so it looks to me like everything that exists to interface FUSE is used roughly equally in this quite small sample set and fusepy is used a little bit more. Borg has a quite reasonable approach of supporting both llfuse and pyfuse3 but it still has to use the low-level API, which I'm hesitant to use.

I see two possibilities:

  1. Use pyfuse3 and implement an inode-to-path cache like high-level FUSE does internally. Of course this would mean reimplementing performance-critical facilities, which already exist in C FUSE, in Python. This seems like a waste and has many pitfalls regarding performance. For example, I cannot just keep growing the "cache" I have to prune it at times to not duplicate paths for millions of files that ratarmount should support. But I'm not quite clear about what the lifetime should be. I guess I could look at how the FUSE high-level API implements this but again, this seems redundant.
  2. Directly interface with FUSE. This is what aliyundrive-fuse and fatx do. To avoid moving away from a pure Python implementation, I would have to do this in another backend maybe something like rfuse or rafuse so basically yet another Python FUSE binding. I might even do it in Rust while am at it. The cargo Python tooling for aliyundrive-fuse looks promisingly lean. Rewriting ratarmount wholly in Rust might be an option but I already tried with C++ and performance wasn't much better if not worse. Furthermore, ratarmount already is ~7.5k lines of Python out of which ~5k are code not counting the tests. It feels hard to port all that while keeping bugs and regressions low.
mxmlnkn commented 2 months ago

I have decided to simply bundle fuse.py into ratarmount / fork it. It's not rocket science and the fuse.py file has a very managable size with less than 2k lines of code. I have added simple FUSE 3 support and all ratarmount tests have run successfully with a FUSE 3 shared library.

Adding FUSE 3 support is not as hard as it sounds. Mostly, some deprecated functions, unused by fusepy, were removed, some arguments added to some callbacks and some new methods added, which do not have to be used. Some FUSE 3 minor version changes have similarly bad ABI changes as this major version break. FUSE 2 vs. 3 is not a FUSE kernel level API change but a change in the relatively lean libfuse wrapper, only. The default will still be to try to load FUSE 2 first because of its stability and only if that was not found, use libfuse3.

With the prioritization of FUSE 2, FUSE 3 only being used as a fallback, all of the extensive ratarmount tests running fine, I'd say that this can be merged for the next ratarmount non-bugfix version. It would still be nice to have more tests in the fusepy fork itself. Having such tests was the reason why the refuse fork fizzled out, so I wouldn't make it a requirement for further development though. Alternatively it might be nice to find fusepy-dependendent project and run their tests in the fusepy fork.

And as always, if some other fusepy user reads this. Help for maintaining the fork, testing on other platforms, or simply a second pair of eyes for reviewing my changes is highly wanted.