jtapolczai / FileSync

Syncrhonising file trees
Apache License 2.0
1 stars 0 forks source link

Multi-way join #17

Closed jtapolczai closed 8 years ago

jtapolczai commented 8 years ago

There should be a way to perform multi-joins. If we take the join operator to be ° and directory trees to be D1,..., this means that

D1 ° D2

should be extended to

D1 ° ... ° Dn

Ideally, this should be a monoidal structure (but one would have to check for this).

jtapolczai commented 8 years ago

Let's denote inner joins by °, outer joins by <°>, left joins by <° and right joins by °>.

jtapolczai commented 8 years ago

If we are only interested in filenames:

jtapolczai commented 8 years ago

How to do this in practice: If we have D1 x ... x Dn with x element of {°,<°,°>,<°>}, then we first perform D1 x D2 on the file system. Selecting either result directory, we then perform D2 x D3 and so forth.

This will guarantee that all directories will be identical in the end w.r.t. filenames and directory structure. The issue is the handling of identically-named files...

jtapolczai commented 8 years ago

Alternative approach:

D1 x ... x Dn ≡ join' S (map pathRoot [D1,...,Dn]) (TreeDiff' [D1,...,D2]) -- for a user-supplied S

TreeDiff' :: [Tree a] -> Tree (a, [Int])

Instead of a tree annotated with LeftOnly/RightOnly, the annotations will express in which trees the given node is present.

join' :: JoinStrategy' a -> [PathRoot] -> Tree (a, Int)

JoinStrategy' = [(a, Int)] -> Vector n [PathRoot] -> IO (Vector n [FileAction'])
FileAction' = Delete | CopyFrom Int

The join strategy will get the list of existing elements (indexed by their position in the list of trees) and the n-long vector of path roots. It will then deliver another n-long vector of actions to take.

jtapolczai commented 8 years ago

We have the issue of inconsistent action sets like [Delete 1, CopyFrom 1]. However, by imposing a preorder CopyFrom * < Delete and performing the copying actions first, we can solve this.

However, the problem with sets like CopyFrom 1, CopyFrom 2 (and larger cycles) still remains. We'd effectively have to do graph analysis on the actions and this can't be done statically either.

jtapolczai commented 8 years ago

A simpler solution would be to only allow binary joins and let the user specify a number of locations in advance. Only two are joined at a time.

jtapolczai commented 8 years ago

This simply is too difficult to implement right.