Feature Request: Add ability to easily subset/filter log files

pharmaverse / logrx

Tools to facilitate logging in a clinical environment with the goal of making code easily traceable and reproducible.

https://pharmaverse.github.io/logrx/

Other

41 stars 6 forks source link

Feature Request: Add ability to easily subset/filter log files #162

Open parmsam-pfizer opened 1 year ago

parmsam-pfizer commented 1 year ago

Feature Idea

The text log file seems to be formatted in a pretty standard way. With some enhancements, maybe users could parse this text file and grab the info they need (list of package versions for example). Or there could be a feature added to output it into an object that you can more easily subset/filter (like json or rds with a nested list).

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

No response

kodesiba commented 1 year ago

There is the option to keep the environment object we use after execution that can be accessed in scripting but flexibility is definitely worth having. You are correct, it's pretty predictable and could be parsed and was something we'd talked about but never implemented,

nicholas-masel commented 1 year ago

I saw an upcoming presentation for PHUSE US Connect, SS04: Post-mortem Logs in R, that is parsing logrx/timber.

I can reach out to my colleague as well to help in getting requirements and see if she has some code to contribute as a starting point.

bms63 commented 1 year ago

I forgot about this presentation. Thanks for the reminder.

tkakinyi commented 1 year ago

Hi, I do have some code that I am writing for phuse that may be a good starting point for enhancement. ATM, it can parse based on strings entered by developer - hoping to get it to a point for strings entered by user. A challenge has been unlike sas logs that have "warning" or "error" whatever message in the respective line, the logrx logs are organized in sections with section headers....nonetheless can still parse them. Still playing around with the code and happy to share for ideas PS: Does anyone have some "dirty" logs I can use to develop?

bms63 commented 1 year ago

Hey @tkakinyi We don't have any dirty logs. For our unit tests, we just have them temporarily created and then removed.

If you make some scripts and logs with some "dirtiness" perhaps we can store them in a dev folder for reproducibility on this repo?

parmsam-pfizer commented 1 year ago

Here's some code to split the log by section headers (on a file named example-logrx.log). It might be worth adding a dash sequence similar to what appears under the Session Information output for subsections (under Errors and Warning and Message, Output, and Results for example). That would make it easier nest them.

library(stringr)
log_txt <- readLines("example-logrx.log")
sect_headers <- c()
sect_status <- FALSE
sect_info <- list()
for (i in log_txt) {
  if (i == paste(rep("-", 80), collapse = "")) {
    sect_status <- !sect_status
  } else if (sect_status == TRUE) {
    sect_headers <- c(sect_headers, i)
  } else {
    cur_pos <- length(sect_headers)
    if (length(sect_info) == cur_pos) {
      sect_info[[cur_pos]] <- c(sect_info[[cur_pos]], i)
    } else {
      sect_info[[cur_pos]] <- i
    }
  }
}
sect_headers <- stringr::str_remove_all(sect_headers, "-?\\s{3,}-?")
names(sect_info) <- sect_headers
sect_info

bms63 commented 1 year ago

So is this going to lead us to a Post-Mortem Logs Vignette? :)

tkakinyi commented 1 year ago

hi all, check out a good starting point(opinion) in https://github.com/tkakinyi/phuse2023/tree/main I have also included 3 "dirty" logs from logrx : rloud was created twice so I could have files of different sizes and admiral_Adae is from admiral I just messed with the file some to generate messages. Running these with logrx should give log files with some errors, warning and messages. Though for errors - only one can be generated as to my comprehension R stops execution when it encounters an error. To test the sas functionality I used internal code and these are more ubiquitous so I did not include them. Pretty large function so can possibly be "chunked" out. known issues for further development

The add_r_sxtn can only be used when parsing an individual {logrx} file, as in example 5.
The source code for the function is currently a large function, which may be costly in system run time.
The argument select_file can only accommodate one file at a time. I did not include them in your repo as I do not know the setup of it, so far this is just an in-script function developed in v 4.2.2 to be sourced [edit] - any immediate feedback is welcome