ramnathv / rblocks

A fun and visual way to learn data structures and control flow in R.
26 stars 9 forks source link

Print values inside blocks #1

Open karthik opened 10 years ago

karthik commented 10 years ago

Since the aim is to use this as a teaching tool, it would be great to see what values the blocks contain.

library(rblocks)
b_list <- make_block(list(x = 1:2, y = LETTERS[1:4], z = c(T, F)))
b_list

image

Especially when working with lists.

ramnathv commented 10 years ago

Great suggestion @karthik. It should be easy to print values. The key issue I am struggling with is how to save both color and data within the object. I thought of using attributes, but they don't get passed around. When a user makes an assignment of an element to a particular color, the data gets replaced. In this case, maybe I could just display the colors for those cells, and data + colors for the rest.

It would be nice if I could attach a fill attribute to every element of the object, so that both data and fills are preserved. Any such solution will have to ensure that these attributes get passed alone when basic manipulations are carried out on the data.

kevinushey commented 10 years ago

I think color doesn't have to be a property of the object -- it's more just a value that can be mapped to from the type or class of the vector / element.

You could have a variable (maybe it lives in a environment somewhere) that maps different types to different colors, so you don't have to worry about carrying color information around.

More generally, theming from e.g. lattice or ggplot2 could serve as motivation. Perhaps something like make_block(..., theme=...). There could be some package-level default theme that users could customize as they need.

ramnathv commented 10 years ago

Thanks for your comments. One reason colour needs to travel with the data is that users can see the effect of an assignment or indexing. For example, d[1] = "red" should show the first column of a data frame coloured red. Default colours can be based on type/class. Maybe a middle ground is to override default colours with actual colours in cells which have them.

kevinushey commented 10 years ago

Hmm. This sounds tricky. Do you mean that, e.g. if we have something like

df <- data.frame(x=1, y=2, z=3)
b <- make_block(df)
b[1] <- "red"

That color "red" should track over different kinds of subsetting / re-arrangement on df itself, e.g.

df <- df[3:1] ## now the third column should be red
df <- df[1, ] ## subsetting should preserve colour
df$x <- 10 ## still red?

and so on?

Or are you thinking of subsetting operations on b itself -- that is, each block contains its own copy, or reference, to df? That is likely more doable, but would be a lot of work to implement each operator we might want.

kevinushey commented 10 years ago

Here's an idea:

A rblock class could contain the following members:

  1. data, which represents whatever data / R object is passed in.
  2. colour, an R object with the same structure as data, but contains contains either NULL (should get a default), or a string (user-specified colour)
  3. theme, maybe representing components of a block-specific theme, with unset values filled in by some default theme.

Operators like <-, [<-, [[<- and so on could be defined such that we

  1. Passed it to modify the data element as is,
  2. Take the arguments supplied to make sure color matches data if subsetting / reordering was done.

Essentially, we hijack the operators so we can modify colour as needed, and then dispatch as normal to the underlying R object.

Not so sure on how the color setting API could work here -- something like set_color(b, 1, "red"). But nested lists would be trickier.

Actually, maybe something like color(b)[[1]] <- "red", where color(b) returns a reference to the color object inside would work. That way regular R semantics for the operators would apply.

ramnathv commented 10 years ago

You are right. The ideal situation would be where we can preserve both the data and the view. Currently, the data is the view! So, when make_block is called on an object, it replaces the actual data with colors. The easiest way to see this is by running the following

df <- data.frame(x = 1, y = "A", z = TRUE)
b <- make_block(df)
print_raw(b)
##        x       y       z
## 1 #a6cee3 #a6cee3 #1f78b4

In this case, I am using the mode of the object to map to the color. Due to R's love for factors, the column y has a numeric mode which is why you see it share the same color as x.

The color as data approach is simple and has the advantage of being robust to the idiosyncratic nature of some subsetting operations. It is also easier to explain to students (mostly newcomers to R), since it has a what-you-assign-is-what-you-have feel.

Currently, I am not convinced about the use case for a more complicated approach, since the basic idea is to manipulate a data structure consisting of colors. However, if I can see some concrete use cases where carrying along the data helps, that would be useful.

One approach I have seen that makes sense to me is based on some notes by @sarahsupp. In short, you create a block object based on the data, and then keep the two representations separately. This allows doing some cool things with looping and more complex operations, while still keeping things clean.

data = read.csv("data/inflammation-01.csv", header=F)
heat_map = make_block(data)
for (x in seq(1:height)){
    for (y in seq(1:width)){
        if (data[x, y] < mean){
            heatmap[x, y] = "red"
        } 
    else if (data[x, y] == mean){
        heatmap[x, y] = "green"
    }
    else{
        heatmap[x, y] = "blue"
    }
}
} 

Another approach I have been toying with is that using reference classes, where the block object consists of data and attributes like fill. Here is a minimal implementation

BlockGrid = setRefClass('BlockGrid', fields = c('data', 'fill'), methods = list(
  initialize = function(data){
    data <<- data
    fill <<- make_block(data)
  },
  show = function(){
    display(.self$fill)
  }
))

Now, you can store the data in the object and manipulate fill independently, since it is fill that controls the view. So you have

df <- data.frame(x = 1:3, y = LETTERS[1:3], z = c(TRUE, FALSE, TRUE))
x = BlockGrid(df)
x$fill[1] = 'red'

rplot05

While this is cool, I feel that this might be overkill for beginners, and the approach taken by @sarahsupp is clean and transparent.

I would welcome any thoughts you have.

ramnathv commented 10 years ago

@kevinushey I think our posts crossed, but it seems like both of us zeroed in on a similar idea. I tried defining [, [[, and $ functions for the S5 class, so that it inherits the method from the class of the data object, but ran into some issues. I like your idea, and maybe some non-standard eval might come into play here, since we can intercept the assignment call and do some voodo with it, updating both data and color and keep them in sync.

Most of my OO experiences is with S5, but I would be happy to choose a more appropriate tool if there is an advantage.

kevinushey commented 10 years ago

I agree that the API for interacting with an object should be identical to that of 'base' R objects. Assigning to a block object should 'feel' like assigning to whatever object it wraps -- if it wraps a data.frame, then the operators as dispatched on a data.frame should apply.

If you wanted to keep things super simple, you could just use S3 (I think). Methods defined as e.g. <-.block, [<-.block should work, although there might be some hiccups in figuring out how to dispatch back on whatever R object the block wraps over.

The main bonus in carrying along the data is allowing display of the actual data within a block, e.g. either just printing the value within the block, or on mouse-over, or something to that effect. I think looking forward it's worth the extra effort of having both data and colour.

ramnathv commented 10 years ago

I concur that it certainly makes sense to explore this further, especially since it might lead to applications that are currently not clear in my mind. Moreover, with the d3js backend, I can think of some really cool stuff we could do, including animated display of loop operations, mouseover, clicks etc.

I agree that S3 would be the easiest solution. But, I am not sure how we can ensure that data attributes are carried forward with different operations. For example, I think apply drops attributes, so not sure how we can handle such a thing with S3.

S5 has the advantage of being able to do traditional OO. But my attempts at defining a [[ and [ function for the BlockGrid class have resulted in the error that S4 is not subsettable.

sarahsupp commented 10 years ago

Thanks for cc'ing me on the conversation. I look forward to seeing how rblocks develops! I can't take too much credit for the code on teaching data frame, loop and function concepts using r blocks. There's already a great set of lesson plans that Software Carpentry uses to teach these concepts using ipythonblocks. I'm working on translating the novice lessons from python to R and adding new information where needed. So your new code is very timely for me!

On Monday I will teach a novice group of students programming concepts in R, and I'll let you know how it goes. I'm trying a new strategy of show, then explain, so that students can get 'hooked' on programming being fun and accessible before getting bogged down in too many R-specific details or too complex of problems. I hope to get to the rblocks example near the end of the lesson.

On Sat, Mar 15, 2014 at 5:40 PM, Ramnath Vaidyanathan < notifications@github.com> wrote:

I concur that it certainly makes sense to explore this further, especially since it might lead to applications that are currently not clear in my mind. Moreover, with the d3js backend, I can think of some really cool stuff we could do, including animated display of loop operations, mouseover, clicks etc.

I agree that S3 would be the easiest solution. But, I am not sure how we can ensure that data attributes are carried forward with different operations. For example, I think apply drops attributes, so not sure how we can handle such a thing with S3.

S5 has the advantage of being able to do traditional OO. But my attempts at defining a [[ and [ function for the BlockGrid class have resulted in the error that S4 is not subsettable.

Reply to this email directly or view it on GitHubhttps://github.com/ramnathv/rblocks/issues/1#issuecomment-37738851 .

Sarah R. Supp http://weecology.org/people/sarahsupp/Sarah_Supp/About_Me Postdoctoral Associate Stony Brook University Graham Lab http://catherinegraham.weebly.com/

I am a co-chair for the Gordon Research Seminar Unifying Ecology Across Scales in 2014: (http://tinyurl.com/UnifyingEcologyGRS) Ask me about it!

ramnathv commented 10 years ago

Thanks @sarahsupp. I think a package of this kind will hugely benefit from a feedback loop where instructors try out a version in class, and we use it to enhance features. The more heads thinking about this, the better :)

ramnathv commented 10 years ago

I did some simple experiments on printing values. The idea was to print values when display is called directly on a non-block object. This is purely to display the data structure and nothing more. For example, consider the following list.

x = list(x = 1:4, y = LETTERS[1:2], z = c(TRUE))
display(x, show_values = TRUE)

rplot07

Now, if we use make_block to create an object, it becomes a block object with data being replaced by colors. So from that point on, it does not make sense to show_values, since it will only show up the color values.

ramnathv commented 10 years ago

I experimented a bit with an S3 based approach to retaining values and colors. It is very preliminary. It makes use of the fills argument that I added to display (to be pushed soon), that allows bypassing the default fill mechanism. I defined a block2 class (so that it doesn' conflict with the existing block class).

`[<-.block2` <- function(x, ...){
  NextMethod("[")
}

as.block2 <- function(x){
  block = x
  block[] = 'lightgreen'
  class(x) = c('block2', class(x))
  attr(x, 'block') = block
  return(x)
}

print.block2 = function(x){
  display(x, show_values = TRUE, fills = attr(x, 'block'))
}

So, now we can do the following to print the matrix with default colors. Now I was thinking that inside the [<-.block2 method, I could intercept the arguments passed to modify the block attribute. So for example, if one does h3[1] = 2, I want to assign attr(h3, 'block')[1] = 'red'. I experimented with a few ways to do it, but nothing worked. @kevinushey any thoughts on how I could make this work within S3. I am aware that attributes are not very portable, but this looks like a promising direction at least for basic stuff.

h2 = matrix(0, 5, 5)
h3 = as.block(h2)
h3