Closed fweber144 closed 4 years ago
I'm not sure what you mean by opening log files with a conventional text editor. Do you mean to grep from an editor over all log files is faster than using grepLogs()
?
To speed things up, there are two options:
1) Move the registry to a faster file system, such as a local SSD.
2) The implementation in grepLogs()
is naive, but as good as it gets with base R. You can try external, highly optimized tools such as ripgrep
(https://github.com/BurntSushi/ripgrep) to grep for strings from the command line.
Thanks a lot for your reply. Yes, I meant opening each log file in a text editor such as Notepad++ and then use the editor's search function to search for the string (more specifically, the regular expression) that I would like to find using grepLogs().
Concerning your suggestions:
Yes, I meant opening each log file in a text editor such as Notepad++ and then use the editor's search function to search for the string (more specifically, the regular expression) that I would like to find using grepLogs().
In case you missed it from the docs: You can restrict which files to grep by providing a set of job ids, and you can open single log files with showLog()
.
Yes, I was aware of that feature. Perhaps I should have given the reason why I want to use grepLogs(): After running all my jobs, I want to retrieve any warning messages. The only batchtools
way I found to retrieve warnings was grepLogs()
. So I really have to grep through all my log files searching for the pattern "^Warning"
. I found the following workaround using data.table::fread()
:
log_files <- list.files(file.path("<path_to_registry>", "logs"), full.names = TRUE)
warn_any <- lapply(log_files, function(file_name_i){
suppressWarnings({
grep_warn <- fread(
cmd = paste("grep \"^Warning\"", file_name_i),
sep = NULL,
header = FALSE,
col.names = "warn_message"
)
})
grep_warn[, file_name := file_name_i]
return(grep_warn)
})
warn_any <- rbindlist(warn_any, fill = TRUE)
For me, this is a lot faster than grepLogs(pattern = "^Warning")
. Similar speed to the data.table::fread()
solution is obtained using base::system()
in combination with grep (see this thread on SO):
rtools_path <- pkgbuild::rtools_path()
grep_path <- file.path(rtools_path, "grep.exe")
log_files <- list.files(file.path("<path_to_registry>", "logs"), full.names = TRUE)
warn_any <- lapply(log_files, function(file_name_i){
sys_command <- paste(grep_path, "\"^Warning\"", file_name_i)
suppressWarnings({
system(sys_command, intern = TRUE)
})
})
A downside of these two workarounds is that it's not so easy to get the job IDs corresponding to the retrieved warnings.
When using batchtools::grepLogs() for checking the log files for a specific text pattern, it takes hours for me and I finally have to cancel because it takes too long. I have a registry with about 2.5 million jobs, so that's quite a lot, but if I check the log files manually using a conventional text editor, it takes just a few seconds. Is there a way batchtools::grepLogs() might be improved to run faster?
My sessionInfo():