harrelfe / Hmisc

Harrell Miscellaneous
Other
204 stars 81 forks source link

collapsing to unique 'x' values & stat_plsmo() Lowess smoothing #163

Open bri2020 opened 1 year ago

bri2020 commented 1 year ago

Dear all, Dear Frank

I have this data (data.csv) for example

x,y
2,1934
16.3636363636364,1618
5.27272727272727,1701
69.5454545454545,3409
59.7272727272727,3334
71,3471
69.5454545454545,3409
69.5454545454545,3409
59.3636363636364,3264
46.7272727272727,2966
46.7272727272727,2966
46.2727272727273,2915
46.7272727272727,3047
46.7272727272727,3048
46.7272727272727,2966
55.2727272727273,3021
51.4545454545455,3377
51,3283
50,2969
46.7272727272727,2966

and this command

ggplot(data = data, mapping = aes(x = x, y = y)) + 
   Hmisc::stat_plsmo()

I get this warning: "Warning: collapsing to unique 'x' values" and I can see why, as I have repeated values in x.

However I am not sure if I should just ignore this warning, because if I am using a bigger data set I even get another warning on top. "Warning message: In regularize.values(x, y, ties, missing(ties), na.rm = na.rm) : collapsing to unique 'x' values"

I would be happy if someone could help "translate" what R is trying to warn me about. (Googling and forum searches have not helped so far).

Many thanks,

Britta

data.csv

KatChampion commented 1 year ago

I am having the same issue except I have stripped out all the duplicates and still get a warning.

I am using wad.quantile inside a mutate. I started out with one data set and it works just fine no warnings. I tried on a second data set that is structurally the same but just has different values and I started to get the collapsing to unique 'x' values warning.

I tried stripping out all zero values and duplicates in the second data set and got the same warning. Mind you my first data set, the one it works on just fine also has duplicates and zeros and it doesn't give me warnings. The only thing different between these two data sets is the ranges of values and the weights used (wts_int) but I don't understand why that should matter.

Here is my data and code if you would like to produce the issue. This is the first data set that works just fine: data_1_reduced.csv And here is the data that produces the warning, both the regular data set and the version with duplicate values stripped out: data_2_reduced.csv, data_2_reduced_no_dups.csv

And these is the code I am running to try and produce the weighted quantiles:

xlim_max = 0.99
xlim_min = 0.01

processed_data_1 = data_1_reduced %>% 
  group_by(scn) %>% 
  mutate(wts = wt_raw/sum(wt_raw),
         wts_int = wts*10^14, 
         quantile_mean = sum(wts*value),
         xlim_max = Hmisc:::wtd.quantile(value, weights = wts_int, probs = c(xlim_max)),
         xlim_min = Hmisc:::wtd.quantile(value, weights = wts_int, probs = c(xlim_min))) %>% 
  ungroup()

processed_data_1_no_dups = data_1_reduced_no_dups %>% 
  group_by(scn) %>% 
  mutate(wts = wt_raw/sum(wt_raw),
         wts_int = wts*10^14, 
         quantile_mean = sum(wts*value),
         xlim_max = Hmisc:::wtd.quantile(value, weights = wts_int, probs = c(xlim_max)),
         xlim_min = Hmisc:::wtd.quantile(value, weights = wts_int, probs = c(xlim_min))) %>% 
  ungroup()

processed_data_2 = data_2_reduced %>% 
  group_by(scn) %>% 
  mutate(wts = wt_raw/sum(wt_raw),
         wts_int = wts*10^14,
         quantile_mean = sum(wts*value),
         xlim_max = Hmisc:::wtd.quantile(value, weights = wts_int, probs = c(xlim_max)),
         xlim_min = Hmisc:::wtd.quantile(value, weights = wts_int, probs = c(xlim_min))) %>% 
  ungroup()

(Note the reason I multiple the weights by 10^14 is because someone on my team who was previously using this function told me it doesn't handle weight smaller than zero well. But if that's not true please let me know. I don't think it should make a material impact on the issue at hand).