Open Moelf opened 3 years ago
IMHO it's an anti-pattern to do
Yikes, I use regexes all the time when using FileTrees. Is it possible to get an ELI5 version of this issue?
When I see filter!
I think of a mutating version of filter
, but the second example looks like some kind of negative filter (i.e !
means not
instead of mutates
). Thats something I have wanted too (everything which does not match a regex) so perhaps I'm just projecting here :)
anti-pattern because getindex(array, RegEx)
does not make sense, filter(predicate, array)
is the universal syntax for this semantics in Julia
Ah, now I understand.
I guess it is just to enable the convenient bracket syntax, similar to what e.g. DataFrames and Dicts do. Not saying you don't have a point, but FileTrees are not arrays so maybe they don't need to follow array conventions.
Couldn't agree more with the original statement that this is not the intuitive way. Persoanlly, I would even like to have a filter option at tree creation. I.e., allow something like
taxi_dir = FileTree("taxi-data", f::Function = x -> true) # by default no filtering
or a keyword filter = x -> true
that only puts files that survive the filter in the tree.
In fact, I didn't even understand that the way to create a tree of "only .x files" was by making the full tree and then doing tree[glob"*/*.x"]
. I only understood this after I found this issue.
I have also missed the predicate when creating the tree. Apart from the filtering, one can also return nothing when loading:
ft = load(FileTree("taxi-data")) do f
endswith(name(f), ".x") && return nothing
# code to load .x files
end
Unless original author has any objections, I'll accept a PR for the FileTree
predicate (make it first argument to support do-syntax).
I'm also somewhat sympathetic to the getindex
issue even though I like the current way just for the sake of convenience. Maybe this is whataboutism, but DataFrames
support the same convention:
julia> df = DataFrame(a = 1:3, aa = 2:4, b = 11:13)
3×3 DataFrame
Row │ a aa b
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 2 11
2 │ 2 3 12
3 │ 3 4 13
julia> df[:, r"a"]
3×2 DataFrame
Row │ a aa
│ Int64 Int64
─────┼──────────────
1 │ 1 2
2 │ 2 3
3 │ 3 4
If the main gripe is the negation, maybe something like tree[Not(r"bla")]
or tree[!, r"bla"])
, but it does not look very appealing. Negative regexes are also possible to construct afaik (at least for simple match patterns), but it is a bit messy.
Afaik, filter
already works on FileTree
s, but it gives you the whole node to allow for filtering on values. Not sure we would like to limit it so you only see the names/paths.
julia> t = maketree("root" => [(name="a", value=2), (name="b", value=3)])
root/
├─ a (Int64)
└─ b (Int64)
julia> filter(Returns(true), t)
root/
├─ a (Int64)
└─ b (Int64)
julia> filter(contains("root") ∘ path , t; dirs=false)
root/
├─ a (Int64)
└─ b (Int64)
julia> filter(contains("a") ∘ path , t; dirs=false)
root/
└─ a (Int64)
julia> filter(!contains("a") ∘ path , t; dirs=false)
root/
└─ b (Int64)
Something seems off with the docstring though. It seems to merge the docstrings for map
and filter
. Furthermore it seems to be incorrect as it has the same semantics as filter
for other collections (true
means keep). I'll fix that when time permits.
help?> filter
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
map(f, tree::FileTree; walk=FileTrees.postwalk, dirs=true)
apply f to every node in the tree. To only visit File nodes, pass dirs=false.
walk can be either FileTrees.postwalk or FileTrees.postwalk.
filter(f, tree::FileTree; walk=FileTrees.postwalk, dirs=true)
remove every node x from tree where f(x) is true. f(x) must return a boolean value.
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
In fact, I didn't even understand that the way to create a tree of "only .x files" was by making the full tree and then doing tree[glob"/.x"]. I only understood this after I found this issue.
We should probably have FileTree(glob"*/*.x")
work.
I do not consider the API to be "finished". Would love to accept any PRs with improvements! I tend to avoid having too many options if the same effect can be achieved through existing features.
We should probably have FileTree(glob"/.x") work.
This is a great idea!
I suppose the equivalent for regex won't make sense though, so maybe we also want the function version?
IMHO it's an anti-pattern to do:
and currently, you will need to do something close to:
we should probably allow
filter!(pred, tree)
to work directly?(if not not to directly apply predicate on
tree.children
is a debatable question)