nanxstats / r-base-shortcuts

⚡ Base R shortcuts: A collection of lesser-known but powerful idioms and coding patterns for writing concise and fast R code
https://nanx.me/blog/post/r-base-shortcuts/
157 stars 16 forks source link

R-Insight: Run-Length Encoding (RLE) #3

Closed brichard1638 closed 1 year ago

brichard1638 commented 1 year ago

The rle function found in base R is obsolete and should not be used. A much better option is the subSeq function found in the doBy package which captures a series of data points related to an RLE, including the following:

RLE results are captured in a data frame. A Dot Plot has been added to facilitate the visualization of an RLE. In this example, binary values are examined: library(broman) library(doBy) set.seed(7854) y = sample(x = 0:1, size = 200, replace = TRUE)

Returns a comprehensive RLE analysis in a data frame yrle = subSeq(y) dotplot(group = yrle$value, y = yrle$slength, main = "RLE Binary Dot Plot", xlab = "Value", ylab = "Run Length", jiggle = "fixed", bg = "red")

To get a table summary of the RLE analysis, apply the following code: table(yrle$value, yrle$slength) -----1----2---3--4--6 0---34--12 --7--4--0 1---31--12 --8--5--1

Two facts are quickly discernible from the RLE analysis:

  1. In the yrle data frame record 102, position 174-179, the RLE analysis shows the longest run-length pattern of (6) 1-based values. The corresponding Dot Plot supports this finding. If one was looking for an outlier pattern this is it.
  2. Considering all run-length patterns in vector y, there are no consecutive patterns of (5) values for either 0 or 1.
nanxstats commented 1 year ago

Thanks for the suggestion. doBy is a classic package that I used a few times before. However, I do want to maintain the focus on base R exclusively so detailing other packages might not fit the context too well. I appreciate the thoughts, though!