I'm pretty new (2 days) to btrfs/winbtrfs, so please bear with me. I've been using Linux and WSL, but with ext4 only. I read a lot here and on Reddit on similar problems & solutions (tried some), but I could not solve these...
I work on voice datasets for AI (many of them). Some background:
The datasets can be large (50-100GB compressed, and can contain a million small audio clips in a single directory.
There is also metadata, mainly text format.
I have many versions of each dataset, thus the audio clips are duplicated (this is academic work to compare results in time axis).
I've been using Windows with NTFS, and with Win 11 NTFS driver started to struggle with so many clips in a single directory, I cannot even untar a larger dataset (>50GB), the speed drops to kilobytes after half of it was decompressed (HDD).
To solve these problems, after some research, I decided on using btrfs - it has the most impressive features:
I wrote a "middleware" to untar the clips into a hierarchical structure (file names have numbers, so it was easy), that would also solve the duplication problem without needing btrfs.
Use two btrfs drives (1) Large HDD for the hierarchical clip structure - no compression & COW off (2) Smaller NVMe SSD to keep textual data - with compression (COW on as required). I formatted them with winbtrfs.
Use WSL to execute Python code (through VS Code) to make full use of btrfs
Mount these two drives to WSL with fstab but be able to access the data from Windows, so that I can use a GUI.
Problems so far, thus my questions:
WSL cannot reach UUID's of drives, so fstab & mount -a gives errors. How do I do that?
If I mount them with wsl --mount -t btrfs, they disappear from Windows (offline) - which defeats the purpose. Is it possible?
I tweaked the registry to set the compression on the NVMe drive, but I don't see any indication of compression. Should I install a full Linux version to check it (I don't want to dual boot/use a full VM because CPU cores & RAM are very important in this line of work)? My mistake, I was looking at drive properties, compression info is on folder properties. For others, I have to move back and forth.
I tried to copy a 64GB text file to that NVMe drive from an NTFS SSD via Windows, and the speed drops to less than 10 MB/s. It perfectly copies to other NTFS drives at max rate. What can be the reason?
I hit the "drive becomes read-only" issue when expanding many files to the btrfs HDD - when I ran the code on Windows. The issue got resolved after a restart, but might happen again, in the middle of hours long process. Will this happen on WSL? How can I make sure that does not happen?
The mount for Linux has autodefrag option, which is nice for the HDD, but I could not find similar in winbtrfs (if I'M not mistaken). Is it not implemented? How can I achieve this, as my usual tools for defrag in Windows are no use here?
Either I have a big misconception, or I'm doing something wrong. Is using WSL really a solution in this case?
Answers and directions are much appreciated.
I'm pretty new (2 days) to btrfs/winbtrfs, so please bear with me. I've been using Linux and WSL, but with ext4 only. I read a lot here and on Reddit on similar problems & solutions (tried some), but I could not solve these...
I work on voice datasets for AI (many of them). Some background:
I've been using Windows with NTFS, and with Win 11 NTFS driver started to struggle with so many clips in a single directory, I cannot even untar a larger dataset (>50GB), the speed drops to kilobytes after half of it was decompressed (HDD).
To solve these problems, after some research, I decided on using btrfs - it has the most impressive features:
Problems so far, thus my questions:
mount -a
gives errors. How do I do that?wsl --mount -t btrfs
, they disappear from Windows (offline) - which defeats the purpose. Is it possible?but I don't see any indication of compression. Should I install a full Linux version to check it (I don't want to dual boot/use a full VM because CPU cores & RAM are very important in this line of work)?My mistake, I was looking at drive properties, compression info is on folder properties. For others, I have to move back and forth.autodefrag
option, which is nice for the HDD, but I could not find similar in winbtrfs (if I'M not mistaken). Is it not implemented? How can I achieve this, as my usual tools for defrag in Windows are no use here?Either I have a big misconception, or I'm doing something wrong. Is using WSL really a solution in this case? Answers and directions are much appreciated.
Edit(s):