richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
223 stars 30 forks source link

filepath.WalkDir vs. filepath.Walk #214

Closed ross-spencer closed 1 year ago

ross-spencer commented 1 year ago

When I was investigating https://github.com/richardlehane/siegfried/pull/212 I noticed a change in Golang 1.16 that might point to some potential performance improvements.

There is a discussion here: https://github.com/golang/go/issues/16399

With a summary here: https://github.com/golang/go/issues/16399#issuecomment-776327385

The Go devs describe filepath.Walk() (used in longpath and longpath_windows) as less efficient than the more recently implemented filepath.WalkDir().

Unfortunately I haven't time immediately to look into how to setup benchmarking, or understand whether the type of improvement offered by this is small fry compared to what's going on in the identification engine. I'm also not sure if SF can use filepath.WalkDir() so I'd want to clarify that first.

Soonest opportunity I might get to look at this is a few months, but raising it just in case anyone is interested in looking at it sooner.

prettybits commented 1 year ago

I had a quick look at this and I believe that this is not currently something that can be changed, as the implemented WalkFunc makes use of os.FileInfo members like ModTime(), Size() and Mode(), which fs.DirEntry provided by filepath.WalkDir() doesn't provide. In short, the stat calls that could be saved this way seem to be needed anyway.

ross-spencer commented 1 year ago

That makes sense to me Bernhard. I was anticipating that may be the case. Happy to close this. Thanks for investigating!