[Question] Split data with breaks

trinker / qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis

http://cran.us.r-project.org/web/packages/qdap/index.html

175 stars 44 forks source link

[Question] Split data with breaks #204

Open JDMorris opened 9 years ago

JDMorris commented 9 years ago

I would like to assign values to intervals. I try, without success:

library(qdap) dataCont = runif(100, min=100, max=200) classes = seq(from = 100, to = 200, by = 25) dist_tab(dataCont, breaks = classes)

what did I do wrong?

trinker commented 9 years ago

dist_tab does not currently work this way. It takes a single number of break points here.

cut says:

either a numeric vector of two or more unique cut points or a single number (greater than or equal to 2) giving the number of intervals into which x is to be cut.

I used the second behavior and restricted it to that because dis_tab works on multiple columns. This would require the user to supply a list of vectors. So in short you did nothing wrong it's that the function doesn't work like that.

I'm going to leave this issue open for now as I hadn't thought of one wanting to supply unequal intervals before.

JDMorris commented 9 years ago

Thank you for your reply! But even with equidistant intervals, if we have for example data=c(102.4, 103.7, 104.8, ...), some people may prefer to have (100, 110], (110, 120], ... as intervals instead of (102, 112], (112, 122], ...

Jean Daniel Morris

trinker commented 9 years ago

When I said equal intervals I was referring to how cut works in its second method:

a single number (greater than or equal to 2) giving the number of intervals into which x is to be cut.

It takes a single value. Say 3 for example. It will split the data into three equal intervals. dis_tab is a wrapper for cut but only takes a single integer value.

I can see why folks may prefer the way you describe. I'll leave this issue open and consider the change. At the moment my focus is one writing and so it would be a few months out if this behavior was added. In the meantime you can get what you want from cut and split. I would think the dplyr package would make light work of this.