karrick / godirwalk

Fast directory traversal for Golang
BSD 2-Clause "Simplified" License
702 stars 70 forks source link

scandir_windows.go should use faster code #30

Open karrick opened 5 years ago

karrick commented 5 years ago

HELP WANTED

If anyone has the knowledge and the cycles to figure out how to more quickly read a file system directory in Windows, I'd love to hear from you, whether it's a PR, or a link to some document I can read to learn more.

karrick commented 4 years ago

After merging #38, it's not readdir_windows.go that would need faster code, but rather scandir_windows.go.

Right now, scandir_windows.go reads a single directory entry from an open file system handle.

There may even be advantages from reading more than a single entry, and if there are, then reading a block of entries into a structure field slice, and enumerating them on each call to Scan might perform better. In such a case, when the structure field has no more items, the Scan code would attempt to read more entries, and eventually run out of entries.

niallnsec commented 4 years ago

First of all thank you for this project, the behaviour of filepath.Walk on Windows has been driving me up the wall.

I think you are spot on when you say

There may even be advantages from reading more than a single entry

You can potentially reduce the enumeration to a single system call in many cases, but it depends on how low level you are happy to go.

You could use the API NtQueryDirectoryFile which is exposed in ntdll.dll. You can use this API to enumerate a full directory in one go, even allowing for filtering with wildcards.

I am currently using this API as part of a project to enumerate the named pipes on Windows (since named pipes are exposed just like a normal filesystem with directories and files) and it works very well.

I should note, that although this API is only officially documented for Kernel mode (see ZwQueryDirectoryFile) the Nt version is just a User mode wrapper and has been available in Windows since XP. I have tested the use of this function on every version of Windows from XP to 10 (build 1909) and have found no issues whatsoever. As it has stayed the same for more than two decades now I would not expect Microsoft to make any breaking changes to it in the foreseeable future.

The prototype for the function I am using looks like this:

import (
    "syscall"
    "unsafe"

    "golang.org/x/sys/windows"
)

type (
    FILE_DIRECTORY_INFORMATION struct {
        NextEntryOffset uint32
        FileIndex       uint32
        CreationTime    uint64
        LastAccessTime  uint64
        LastWriteTime   uint64
        ChangeTime      uint64
        EndOfFile       uint64
        AllocationSize  uint64
        FileAttributes  uint32
        FileNameLength  uint32
        FileName        [1]uint16
    }

    FILE_NAMES_INFORMATION struct {
        NextEntryOffset uint32
        FileIndex       uint32
        FileNameLength  uint32
        FileName        [1]uint16
    }

    UNICODE_STRING struct {
        Length    uint16
        MaxLangth uint16
        Buffer    uintptr
    }

    IO_STATUS_BLOCK struct {
        StatusPointer uintptr
        Information   uintptr
    }
)

var (
    modntdll = windows.NewLazySystemDLL("ntdll.dll")
    procNtQueryDirectoryFile = modntdll.NewProc("NtQueryDirectoryFile")
)

const (
    STATUS_NO_MORE_FILES = syscall.Errno(0x80000006)

    FileDirectoryInformation = uint32(1)
    FileNamesInformation = uint32(12)
)

func NtQueryDirectoryFile(hFile syscall.Handle, hEvent syscall.Handle, apcRoutine uintptr,
    apcContext uintptr, ioStatusBlock *IO_STATUS_BLOCK, finfo uintptr, length uint32,
    fiClass uint32, singleEntry uint32, fname *UNICODE_STRING, restart uint32) error {
    r1, _, _ := syscall.Syscall12(
        procNtQueryDirectoryFile.Addr(),
        11,
        uintptr(hFile),
        uintptr(hEvent),
        apcRoutine,
        apcContext,
        uintptr(unsafe.Pointer(ioStatusBlock)),
        finfo,
        uintptr(length),
        uintptr(fiClass),
        uintptr(singleEntry),
        uintptr(unsafe.Pointer(fname)),
        uintptr(restart),
        0,
    )
    if r1 != 0 {
        return syscall.Errno(r1)
    }
    return nil
}

Note: You can remove the dependency on x/sys/windows by using syscall.NewLazyDll. The only reason to use windows.NewLazySystemDLL is because it restricts the paths the OS will load the DLL from, ensuring that the system copy is used and not a malicious DLL with the same name placed in the OS search path. This advantage is probably irrelevant to ntdll.dll since it is automatically loaded into every process and so there is no risk of loading an incorrect version here.

You can call it in a loop until you get an error code of STATUS_NO_MORE_FILES. If you provide it with a relatively large buffer then you have a good chance of getting all of the information in one go.

I think the loop would look something like this (pulled out from a test project):

var status IO_STATUS_BLOCK
buf := make([]byte, 1024*64)
for {
    err = NtQueryDirectoryFile(
        hFile,
        0,
        0,
        0,
        &status,
        uintptr(unsafe.Pointer(&buf[0])),
        1024*64,
        FileDirectoryInformation,
        0,
        nil,
        0,
    )
    if err != nil {
        if err.(syscall.Errno) == STATUS_NO_MORE_FILES {
            break
        } else {
            panic(err)
        }
    }
    if status.Information == 0 {
        panic("no data")
    }
    var offset uint32
    fdi := (*FILE_DIRECTORY_INFORMATION)(unsafe.Pointer(&buf[0]))
    for { //Walk the returned buffer
        var name []uint16
        var sname string
        var file FileInformation
        if fdi.FileNameLength > 0 {
            //Convert the variable length field FileName to a slice so it can be passed to UTF16ToString
            h := (*reflect.SliceHeader)(unsafe.Pointer(&name))
            h.Data = uintptr(unsafe.Pointer(&fdi.FileName[0]))
            h.Len = int(fdi.FileNameLength) / 2
            h.Cap = h.Len
            sname = syscall.UTF16ToString(name)
        }
        file.Name = sname
        file.Size = int32(fdi.AllocationSize)

        output = append(output, &file)

        offset += fdi.NextEntryOffset
        if fdi.NextEntryOffset == 0 || offset > uint32(status.Information) {
            break
        }
        fdi = (*FILE_DIRECTORY_INFORMATION)(unsafe.Pointer(&buf[offset]))
    }
}

_Note: This example is using FileDirectoryInformation as the information type and so we recieve a buffer of FILE_DIRECTORY_INFORMATION structures. However, if the file name is the ONLY piece of information required the call to NtQueryDirectoryFile could be changed to use FileNamesInformation which returns an array of FILE_NAMESINFORMATION structures and should be slightly quicker.

The output variable in this example would be a storage buffer for Scan to read the information from.

Scan could be implemented such that it reads from the storage buffer unless it is empty, in which case it makes a single call to NtQueryDirectoryFile. If the call returns STATUS_NO_MORE_FILES then Scan would return false as there are no more items to be read, otherwise it would repopulate the storage buffer with the returned data and allow you to keep iterating.

In theory this should be the fastest possible way to enumerate a directory on Windows as it bypasses the overhead added by both the Go standard library and the FindFirstFile/FindNextFile APIs by going directly to the root data source in the Nt API.

If this is not going too low level I'd be happy to try putting together a PR using this approach, although time is a little tight right now so I probably won't be able to do it right away.

karrick commented 4 years ago

@niallnsec, thanks for your feedback. I greatly appreciate the efforts you put into the above. I just did a first run improvement on how ReadDirents and ReadDirnames run on Windows, by having them invoke the syscall with a -1 limit argument, which has them return all the os.FileInfo instances. This should be a bit more efficient, but I still need to give your above code examples a thorough read through, and see about adding them.

I appreciate your help, and have not stopped working on this issue.

kapsh commented 2 years ago

@karrick hi, was this implemented yet? We noticed that telegraf filecount input doesn't perform well on Windows with large nested files count (millions specifically). I believe that godirwalk 1.16.1 works under the hood there, so speeding this up would benefit many it's users. Thanks.