haltcase / glob

Pure Nim library for matching file paths against Unix style glob patterns.
https://glob.bolingen.me
MIT License
61 stars 5 forks source link

feat: rework api to use enum flags, add filter procs #19

Closed haltcase closed 6 years ago

haltcase commented 6 years ago

Closes #8 (ref), closes #10 (ref), closes #11, closes #13 (ref), closes #17 (ref), closes #12 (ref)

API changes

Options (like includeDirs or includeHidden) are no longer provided as individual boolean parameters and are instead part of a set[GlobOption], the default for which is exported as defaultGlobOptions. You can alter this like you would any other set using unions and intersections:

import glob
import sequtils

echo toSeq(walkGlob("src", options = defaultGlobOptions + {Hidden, FollowLinks} - {Files}))

# or replace the default options completely
echo toSeq(walkGlob("src", options = {Hidden}))
# @[... only hidden files under src/ ...]

listGlob removal

The seq-returning proc listGlob is also removed to encourage the use of the iterators walkGlob or walkGlobKinds. It's also easy enough to get a seq from these using sequtils.toSeq from Nim's stdlib that it was mostly unnecessary, and this reduces the number of signatures we need to keep in sync.

New features

Option flags

The new GlobOption enum is defined as:

type
  GlobOption* {.pure.} = enum
    ## Flags that control the behavior or results of the file system iterators. See
    ## `defaultGlobOptions <#defaultGlobOptions>`_ for some usage & examples.
    Absolute, IgnoreCase, NoExpandDirs, FollowLinks,  ## iterator behavior
    Hidden, Files, Directories, FileLinks, DirLinks   ## to yield or not to yield

Several of these are equivalent to parameters in the current API, but a few of them are new:

current flag meaning
relative = false GlobOption.Absolue yield paths as absolute rather than relative to root
expandDirs = false GlobOption.NoExpandDirs if pattern is a directory don't treat it as <dir>/**/*
includeHidden = true GlobOption.Hidden yield hidden files or directories
includeDirs = true GlobOption.Directories yield directories
N/A GlobOption.IgnoreCase matching will ignore case differences
N/A GlobOption.Files yield files
N/A GlobOption.DirLinks yield links to directories
N/A GlobOption.FileLinks yield links to files
N/A GlobOption.FollowLinks recurse into directories through links

So this new options set allows for more granular things like only returning links:

for path in walkGlob(".", options = {FileLinks, DirLinks}):
  # only links!
  discard

More advanced filtering

When the options flags aren't enough, you can do a lot more with the optional filterYield or filterDescend procs (subsuming #10) to decide what gets yielded and what directories are traversed.

type
  FilterDescend* = (path: string) -> bool
  FilterYield* = (path: string, kind: PathComponent) -> bool

Both of these receive a path as their first parameter (relative or absolute, depending on if Absolute in options).

filterDescend is called whenever the iterator comes across a directory that is about to be recursed into. If you return false, the iterator will abandon that recursion. Any path received by a filterDescend proc is guaranteed to be a directory or a link to a directory.

filterYield is called whenever an item (file, directory, or link) is about to be yielded. If you return false, the iterator will not yield the item.

These procs are called after options flags are checked, so if Files notin options then no paths pointing to files will be passed to filterYield.

timotheecour commented 6 years ago

super cool! wow, lots of updates, thanks!

haltcase commented 6 years ago

you should use ospaths.FileSystemCaseSensitive 

I did use that for part of it, maybe it can be used more but I'd wait for that bug to get fixed.

what's the rationale for `2f8d1a2 feat: flipExpandDirsoptiontoNoExpandDirs``` ?

I noticed when putting together examples that it was easier to have fewer defaults because it makes it fairly simple when completely replacing them. I know it's pretty easy to just modify the defaults but this seems like a good balance. Generally I'd agree with you and wouldn't use a negated option but the ergonomics on this felt better.

haltcase commented 6 years ago

nit: IgnoreCase => CaseInsensitive ? to match existing terminology in makeCaseInsensitive

That's an internal proc, shouldn't need to worry much about consistency between it and a public flag.

haltcase commented 6 years ago

wouldn't it make more sense to add a flag CaseSensitive (defaulting to true) in https://github.com/citycide/glob/blob/master/src/glob/regexer.nim instead of transforming pattern in makeCaseInsensitive ?

That's what I did for the parser:

https://github.com/citycide/glob/blob/b5dc6c68a4757e86f369f39b1020b4056f5c5a44/src/glob/regexer.nim#L74

makeCaseInsensitive is there because there's no way on case sensitive systems (that I know of) to list files case insensitively. So we turn each character in the string into a character class of its lower and upper case forms, then pass that as a pattern to Nim's os.walkPattern to gather up all the matches.

does FollowLinks take care of avoiding infinite recursion? wasn't clear from top-level msg

Yep, see here:

https://github.com/citycide/glob/blob/b5dc6c68a4757e86f369f39b1020b4056f5c5a44/src/glob.nim#L448-L456

is UTF8 supported ? (eg in case of case sensitivity + other); whatever answer is, would be nice to specify in docs

Not currently, we can leave it as a follow on and add it to the docs for now. It might work outside case sensitivity though.

timotheecour commented 6 years ago

@citycide please see https://github.com/citycide/glob/pull/20 ; seems much simpler than https://github.com/citycide/glob/commit/7115d8d39d420beaf4bf4c3f167105183576c305 for handling case insensitivity

haltcase commented 6 years ago

I'm going to merge & release 0.8.0 soon unless anyone has something that should get into this first, but we can always follow up afterward.