bisync: use `trie` internally to reduce footprint

Synopsis

TODO See :+1: and ⏬

Prior discussions

Mentioned at https://github.com/rclone/rclone/pull/5587#issuecomment-917416354 and...

https://github.com/cjnaz/rclonesync-V2/issues/59

One rclonesync user had about 2M files and ran out of memory. I optimized rclonesync to get it down to two in-memory file listings at any time...

https://github.com/rclone/rclone/pull/5164#issuecomment-843481228 (ivandeex)

Listings keep a lot of self-recursive path strings: [/movies/]alpha, [movies/]bravo, [movies/][zeta/]hello, [movies/][zeta/]world. This gives a good possibility for compression using trie (using by-path segmentation like above or by-character).

In short, I want to make something like a modified dghubble/trie (not this one precisely but something similar - searching github didn't return anything that'd satisfy all my requirements), with 3 fast methods: add path, map path -> int32, int32 -> path (delete and modify operations are not needed). I'd fill it up when a prior listing is parsed or new one generated. Delta engine and queue operations will pass the trie as a shared per-session object and use int32 instead of file names.

This is another postponed item. I'd rather start from thousands, then proceed to zillions.

I understand that rclone deals with the file system recursively by directory rather than the whole tree?

Depends on backend features. --fast-list enables whole tree at least in Google Drive. bisync just uses internal walk API leaving optimizations to lower level.

How to use GitHub

Please use the 👍 reaction to show that you are affected by the same issue.
Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
Subscribe to receive notifications on status change and new comments.

rclone / rclone