--match patterns are only applied on file names. It uses fnmatch syntax.
--exclude patterns are applied after --match and work on file and directory names.
New implementation (pyftpsync, 'glob' branch)
--match uses the glob syntax from wcmatch.globmatch
Glob generally matches entry names, i.e. directories or files.
The special ** pattern matches zero or more directories, a ! prefix negates a pattern, etc.
--exclude patterns are still available (applied after --match)
Using this pattern --match "*.txt" in the old approach would match:
file1.txt
folder3/ <= directories not affected by '--match'
files_3_1.txt
Using this glob pattern --match "*.txt" would match one entry in the new approach:
file1.txt
Using a hierarchical glob pattern like --match "**/*.txt" would match two entries
file1.txt
folder3/ <= not matched!
files_3_1.txt
Question 1
When combined with the --delete-unmatched option, what should happen?
One could assume that folder4/ gets deleted, but folder3/ remains, because it contains content.
However, this requires to traverse folders depth-first in order to decide if they have remaining content.
The precious approach never discarded directories with a --match pattern.
Discarding directories was done using --exclude folder4 for example.
Question 2
The common use case 'only text files' must be passed as --match "**/*.txt".
If we decide to traverse all folders (matching or not) in order to apply the pattern to descendants, we still would have to prepend **/ to the patterns. This seems redundant, especially compared to the current approach where --match *.txt works on file names in all levels.
Conclusion
Currently I am tending towards dropping this new feature, since it seems more confusing than the existing approach.
This pull request has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.
Implementation
Current implementation (pyftpsync <= 3)
--match
patterns are only applied on file names. It uses fnmatch syntax.--exclude
patterns are applied after--match
and work on file and directory names.New implementation (pyftpsync, 'glob' branch)
--match
uses the glob syntax from wcmatch.globmatch Glob generally matches entry names, i.e. directories or files. The special**
pattern matches zero or more directories, a!
prefix negates a pattern, etc.--exclude
patterns are still available (applied after--match
)Open Questions
Assuming this structure:
Using this pattern
--match "*.txt"
in the old approach would match:Using this glob pattern
--match "*.txt"
would match one entry in the new approach:Using a hierarchical glob pattern like
--match "**/*.txt"
would match two entriesQuestion 1
When combined with the
--delete-unmatched
option, what should happen?One could assume that folder4/ gets deleted, but folder3/ remains, because it contains content. However, this requires to traverse folders depth-first in order to decide if they have remaining content.
The precious approach never discarded directories with a
--match
pattern. Discarding directories was done using--exclude folder4
for example.Question 2
The common use case 'only text files' must be passed as
--match "**/*.txt"
. If we decide to traverse all folders (matching or not) in order to apply the pattern to descendants, we still would have to prepend**/
to the patterns. This seems redundant, especially compared to the current approach where--match *.txt
works on file names in all levels.Conclusion
Currently I am tending towards dropping this new feature, since it seems more confusing than the existing approach.