martymac / fpart

Sort files and pack them into partitions
https://www.fpart.org/
BSD 2-Clause "Simplified" License
230 stars 39 forks source link

Add multi-thread support while crawling the file system #32

Closed BinaryDevotee closed 3 years ago

BinaryDevotee commented 3 years ago

Is it possible to add multi-thread support to fpart, similar to the functionality -n from fpsync?

A little bit of context on it: I am dealing with a file system that has a big amount of small files (1 TB of 7M+ files), and partitioning them with fpart takes about 63 minutes with the parameters I specified. Conversely, by running:

find /path/to/filesystem/ -mindepth 2 -maxdepth 2 -type d | parallel -j8 find {} -type f | split -dl 100000 - list

it takes 36 minutes to complete. Of course, the produced list is not sorted but this is unimportant for my use case. Ideally, I would like to be able to achieve the same results with fpart by leveraging more parallel jobs.

Can this be achieved?

martymac commented 3 years ago

Hello Athila,

This is something I originally wanted to achieve, BUT fpart in its current version is tightly bound to fts(3) and depends a lot on its features/characteristics (mostly depth-first traversal, dir entry sorting, ...) ; switching to parallel FS crawling would probably not provide those abilities and require a big rewrite. It is no more in the scope right now.

Best regards,

Ganael.

BinaryDevotee commented 3 years ago

Hi, Ganael,

Makes sense. If you ever want to tackle this, I will be more than happy to test. :)

For now, thank you for your reply.

Regards

martymac commented 3 years ago

You're welcome :)

Cheers,

Ganael.