karrick / godirwalk

Fast directory traversal for Golang
BSD 2-Clause "Simplified" License
706 stars 70 forks source link

Walk files of folder before recursing into sub-folders #67

Open shlomi-dr opened 3 years ago

shlomi-dr commented 3 years ago

Is there a way with this library to first invoke the callback on all direct files under the current folder before recursing into the sub-folders?

Example:

package main

import (
    "fmt"
    "github.com/karrick/godirwalk"
)

func main() {
    godirwalk.Walk("/tmp/test", &godirwalk.Options{
        Callback: func(osPathname string, directoryEntry *godirwalk.Dirent) error {
            fmt.Printf("%s\n", osPathname)
            return nil
        }})
}

currently results in:

/tmp/test
/tmp/test/afile
/tmp/test/dir1
/tmp/test/dir1/file2
/tmp/test/file

and the ask is to return:

/tmp/test
/tmp/test/afile
/tmp/test/file
/tmp/test/dir1
/tmp/test/dir1/file2

This way the results are still deterministic as all files of current folder are sorted, and the sub-folders are recursed into also in a sorted order.

Thanks!

gmolau commented 2 years ago

This is asking for lexicographic BFS as opposed to the standard lexicographic DFS. The problem is that this library seems to offer only BFS with unspecified sorting via setting the Options.Unsorted to true. I find this a rather confusing API, @karrick am I missing something here? Would it be possible to offer a true lexicographic BFS like the OP mentioned?

GwynethLlewelyn commented 1 year ago

@shlomi-dr I'm assuming that in your scenario, you actually do much more than just getting the filename, right?

Because if that's all you need, your best option is just to collect all the filenames on an array of whatever type structure you prefer, make sure that it complies with sort.Interface (like Dirent does!), and sort it at the end, when godirwalk finishes its job, selecting whatever algorithm you like most :)

Granted, if there is a lot of processing you need to do on each node, then this very simplistic approach will not work — you'd need something more sophisticated in terms of directory transversal. There are quite a few algorithms for that, of course...

shlomi-dr commented 1 year ago

Hi @GwynethLlewelyn, thanks for replying! However, since I needed this 2 years ago, I really don't remember the use-case, and probably just wrote a simple function to do so, not using this library.

Saying that, I don't actually think it matters what my specific scenario was, what anyone's use-case may be, or how much processing is done in each iteration. It's simply irrelevant :) This library is called "godirwalk" which implies that it specializes in traversing directories. It feels natural that such library would support at least the most trivial methods of walking a directory structure. DFS and BFS are the most standard ways of traversing a tree, any tree, for any purpose. I don't think that first traversing a tree, then sorting it "how ever I'd like" (while loosing the traversal context), and then do processing is an appropriate approach..

In any event, thanks for the reply! Hope this library improves :)

GwynethLlewelyn commented 1 year ago

Ah, sorry, you're right, I tend to forget to look at the dates lol

In any case, I just wanted to add that recently I stumbled upon another use-case for something similar to what you've originally asked for (not quite the same as DFS vs BFS) and got stuck trying to find an elegant solution for that. It'll be posted shortly as a new issue :-)