harrelfe / Hmisc

Harrell Miscellaneous
Other
205 stars 81 forks source link

Cut2 odd behaviour #95

Open pabloacera opened 5 years ago

pabloacera commented 5 years ago

I'm using R version 3.4.4 (2018-03-15), Hmisc package Hmisc_4.1-1 I want to use cut2 to bin a list of number (x) based on the number of occurrences of these numbers. I also want to use a threshold (m) that define a minimum number of elements in that bins. example. library(Hmisc) x = c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6) print(x) cut2(x, m=4)

x = 1 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 [1] [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) 3 3 3 [4,6) [4,6) [13] [4,6) [4,6) [4,6) [4,6) 6 6 6
Levels: [1,3) 3 [4,6) 6

I have set 4 as the desired minimum number of elements in each bin. The bins that cut2 gives are: [1,3) 3 [4,6) 6

My question is, why does cut2 leaves 3 as a single bin, if there is just 3 observations?, and then it also leaves 6 as a single bin having 3 observations too. Wouldn't it has more sense to have [1,3) [3,5) [5,6] ?? as all the bins would have at least 4 observations. I am a bit confuse with it any input are appreciated. thanks fro your time.

harrelfe commented 5 years ago

Feel free to use Github to create an edited version of cut2, run all the tests, and I'll strongly consider adding it to Hmisc with credit to you as a co-author. Frank

Frank E Harrell Jr Professor School of Medicine

Department of Biostatistics Vanderbilt University

On Mon, Sep 24, 2018 at 7:25 AM pabloacera notifications@github.com wrote:

I'm using R version 3.4.4 (2018-03-15), Hmisc package Hmisc_4.1-1 I want to use cut2 to bin a list of number (x) based on the number of occurrences of these numbers. I also want to use a threshold (m) that define a minimum number of elements in that bins. example. library(Hmisc) x = c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6) print(x) cut2(x, m=4)

x = 1 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 [1] [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) 3 3 3 [4,6) [4,6) [13] [4,6) [4,6) [4,6) [4,6) 6 6 6 Levels: [1,3) 3 [4,6) 6

I have set 4 as the desired minimum number of elements in each bin. The bins that cut2 gives are: [1,3) 3 [4,6) 6

My question is, why does cut2 leaves 3 as a single bin, if there is just 3 observations?, and then it also leaves 6 as a single bin having 3 observations too. Wouldn't it has more sense to have [1,3) [3,5) [5,6] ?? as all the bins would have at least 4 observations. I am a bit confuse with it any input are appreciated. thanks fro your time.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fharrelfe%2FHmisc%2Fissues%2F95&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7C6f213a24c7234ff4733c08d62218e109%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636733887582198212&sdata=8mX0%2BgElMi02F1Epl2cX9HXz4Bu0AYClebf41bZzaxQ%3D&reserved=0, or mute the thread https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABGO2klgJmEWhbiWNdQV8VeUGh-RU8wTks5ueM9TgaJpZM4W2jgV&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7C6f213a24c7234ff4733c08d62218e109%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636733887582198212&sdata=nBENLWA451o%2FcP7WxJocwIoB7bXLEGGRRHTaWlRxHrI%3D&reserved=0 .