spring-projects / spring-integration

Spring Integration provides an extension of the Spring programming model to support the well-known Enterprise Integration Patterns (EIP)
http://projects.spring.io/spring-integration/
Apache License 2.0
1.54k stars 1.1k forks source link

Files.inboundAdapter watchService - ignore subdirectories #3557

Closed szilardk closed 1 year ago

szilardk commented 3 years ago

Expected Behavior

Files.inboundAdapter(new File(dir))
        .useWatchService(true)

avoid listing the files from all subdirectoried

directory structure

/workDir
     /done
     /failed

would like to read all the files from /workDir and move them in done/failed after processing. in my scenario it is not useful to scan the subdirectories. it is just taking time if the subdirectories contain a lot of files. i had a look in the WatchServiceDirectoryScanner.walkDirectory where Files.walkFileTree is used. this has "int maxDepth". would it make sense to expose this?

Current Behavior current implementation would look for all files in all subdirectories.

Context what i use now is

Files.inboundAdapter(new File(dir))
        .useWatchService(true)
        .filter 

with the filter i can eliminate everything i do not need. it would be even better if the subdirectories were not scanned at all since in my case there are 10K -100K files

artembilan commented 3 years ago

Why just don't use a plain polling behavior of the Files.inboundAdapter() and don't try to abuse a WatchService which puprose is really to let us to scan the whole file tree?

See docs for more info: https://docs.spring.io/spring-integration/docs/current/reference/html/file.html#watch-service-directory-scanner

szilardk commented 3 years ago

thank you for the reply. i will try using the plain polling behavior or the WatcherService with the filter. i will have to do some performance tests to see which one works better in my case. are there some guidelines when to use WatcherService and when not to use it?

artembilan commented 3 years ago

Well, one of them of course about walking through the whole file tree. Another one is to react for event in the file system: like updates to files or their removal.

There must not be any performance difference since both approaches are handled by the SourcePollingChannelAdapter.

kodecharlie commented 2 years ago

As discussed, use-watch-service=true implies a full directory-tree scan. This, in fact, was the documented behavior in the spring.io references. But intuitively, it seems what watch-service would offer an option to regulate the recursion. Someone mentioned exposing a maxDepth property that's already natively supported in the watch-service logic. Well, that's one way, although limiting in its own way. Possibly a better solution is to inject some kind of filter into the watch-service that regulates which sub-directories are scanned. If the filter is *, then the implication is all sub-directories are scanned; if the filter is empty, then none; if the filter is a regex, then only those subdirs that match are scanned.

I don't have a special use-case that warrants this behavior. But looking careful in the docs and reading, in fact, the source code itself for FileReadingMessageSource, this just seems reasonable behavior out-of-the-box.

artembilan commented 2 years ago

I think I find your suggestions reasonable, so we fix it in the next 6.0 as two options: int maxDepth and Predicate<Path> watchDirFilter.