jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
58 stars 29 forks source link

Compute column as mean of other columns #516

Closed DocVXL closed 6 months ago

DocVXL commented 5 years ago

I appreciate the ability to compute columns in JASP that was recently added. However, there are still some serious shortcomings particularly with some of the mathematical functions provided on the right side of the screen. One, in particular, is mean(y).

If I were to calculate a new column using a mean, I would almost certainly want to compute the mean across columns. For example, I might have a ten-item measure of self-esteem where I would want to compute a column that is the mean of the responses to these ten items. This seems like a pretty basic and common approach to calculating a column.

Unfortunately, the mean(y) function only allows you to put in one variable. I cannot calculate the mean across columns, but rather only the mean of one column. The new column will have only a single entry on the first row showing the mean of that variable. I don't really see the utility of this calculation since I could look at the descriptive statistics to get the mean of that variable. That mean also would not apply directly to the first row (or case) in the data file, so it's location there seems out of place. The same is true of the sum function, which again would be most useful to calculate across existing columns.

Of course, there are work-a-rounds. The ten items could be added together and then divided by ten (though I will point out that the drag and drop method for is not very smooth for that either). R code can be be used, but would lose the advantage of the drag and drop method that minimizes concern with typos (in variable names for example).

Again, I really appreciate the development that has gone into JASP and the inclusion of the compute column feature. I just hope that it is improved in future releases.

NMethner commented 1 year ago

Hi!

I totally agree with DocVXL. Is there any news by now?

jgreeneb commented 1 year ago

It appears the only option to do this (I am looking for this feature also) is to try it in R. The mean function works very well in SPSS for computing columns.

JorisGoosen commented 1 year ago

Sorry no news yet, we were focussing on the new data editing stuff and some other new features.

But yeah I agree it should be easier to to rowwise stuff in the compute column a bit easier...

Would just the mean be enough though? Ive also had people request the sum and perhaps there are more of these functions.

Because I was at first thinking of a general solution like row(col1, col2, col3) but how that converts to R for each given superfunction (so things like mean(row(col1, col2, col3))) is not clear to me.

However adding functions like rowMean(col1, ...) and rowSum(col1, ...) should be fairly simple and relatively easy to implement. @sophieberkhout which other row functions would you recommend?

sophieberkhout commented 1 year ago

This would be great! I think the mean and sum would be the most popular. SPSS also offers SD, variance, median, min, max, and coefficient of variation (sd / mean), so some users might also like to have these.

I believe it is also worth discussing missing data handling. If one value in a row is missing, should we calculate a mean of the leftover values? rowMeans in R gives an NA by default but can give a mean of the leftover values with na.rm = TRUE. I think the former is definitely preferred, but maybe there are situations where someone might need the latter.

JorisGoosen commented 1 year ago

I could add a row() and rowNaRm() or some better name than that

richlv commented 11 months ago

To make sure I understand the current functionality, currently there is no way to compute mean across several columns using the mean() function in JASP, right?

tomtomme commented 11 months ago

Yes. You need to manualy add up all the variables and divide by n. If missings are present you need the R-Code I give in #1732

JorisGoosen commented 11 months ago

afbeelding

(Sadly enough adding this to the drag&drop constructor is a bit too painful... But this should work nicely)

JorisGoosen commented 11 months ago

Ive added:

    "rowMean",          "rowMeanNaRm",
    "rowSum",           "rowSumNaRm",       
    "rowSD",            "rowSDNaRm",    
    "rowVariance",      "rowVarianceNaRm",  
    "rowCovariance",    
    "rowCorrelation",   
    "rowMedian",        "rowMedianNaRm",    
    "rowMin",           "rowMinNaRm",   
    "rowMax",           "rowMaxNaRm"
richlv commented 9 months ago

Did this go in 0.18.2? Cannot spot it in https://jasp-stats.org/release-notes/ .

tomtomme commented 9 months ago

Yes, I already used it.

boutinb commented 9 months ago

I've just added this feature in the release notes.

richlv commented 9 months ago

I've just added this feature in the release notes.

Thank you so much.

Getting a bit offtopic, but I poked the JASP account on Mastodon about some minor issues in the release notes :) https://mastodon.social/@richlv/111693702409669246

boutinb commented 9 months ago

Good catch! I've just fixed the HTML.

tomtomme commented 7 months ago

@JorisGoosen

So I just tested the new rowStuff more. Some seem broken:

"rowCovariance" "rowCorrelation"

Tested in 0.19 beta gives me

Error: Error in rowCovariance(FaceType, Attractiveness): could not find function "rowCovariance"

tryCatch(suppressWarnings({
    returnVal <- eval(parse(text = .rCode))
}), error = function(e) {
    .setRError(paste0(toString(e), "\n", paste0(sys.calls(), collapse = "\n")))
})
tryCatchList(expr, classes, parentenv, handlers)
tryCatchOne(expr, names, parentenv, handlers[[1]])
value[[3]](cond)

same for Correlation. And I would not even know what those functions would accomplish. Correlating over rows would only make sense for wide-format data and if you can select the two rows you want to correlate. But it is not documented in the helpfile how you would select those two rows.

Same error message in 0.18.3 and for rowCorrelation.

tomtomme commented 7 months ago

And also rowSum is broken with nominal and ordinal vars. The results just do not add up. Works with metric scale however.

JorisGoosen commented 7 months ago

@tomtomme ah yeah I noticed... Ill fix it together with a whole bunch more fixes and updates to the data and columns.

Ive been talking today with @JohnnyDoorn on a good interface and I think weve got an idea. At our internal issue tracker we have something for it: https://github.com/jasp-stats/INTERNAL-jasp/issues/2481 That one is for implementing a special "row computed columns" computed-columntype. So that would be next to the current "R-Code" and "Drag&Drop" ones we already have.

There would be a simple column selector like in the analyses and then the required R code would be generated saving a lot of typing.

Im open for any PRs to the documentation of the functions. But particularly those two were just put in originally by me because I just copied a list of functions I already had for columns. Those two didnt work so they got removed from R and the interface but got stuck in the docs.

And also rowSum is broken with nominal and ordinal vars. The results just do not add up.

Is this perhaps because instead of the "values" it uses the factor-level?

tomtomme commented 7 months ago

Is this perhaps because instead of the "values" it uses the factor-level?

Yes. First I thought it uses the rank of the data. But actually it uses the position of the value, ignoring the value itself. But resorting the values does not change the result. Really strange. Using the data rank would have made sense, kind of. But the using the initial position, does not. So rowSum should better throw an error with Nomi and Ordi.

JorisGoosen commented 7 months ago

Is this perhaps because instead of the "values" it uses the factor-level?

Yes. First I thought it uses the rank of the data. But actually it uses the position of the value, ignoring the value itself. But resorting the values does not change the result. Really strange. Using the data rank would have made sense, kind of. But the using the initial position, does not. So rowSum should better throw an error with Nomi and Ordi.

Is this with 0.18.3 or the emptyValuesRefactor-nightly?

tomtomme commented 7 months ago

same for both. My flatpak is probably behind though with commit https://github.com/jasp-stats/jasp-desktop/commit/f5f90baafeb87752846230cf5c87bae2d132dfda

march 4 build

JohnnyDoorn commented 7 months ago

To add to Tom - I just tried with the emptyValuesRefactor-nightly, and got a crashed engine when trying the rowSum (and some complete crashes when trying to access compute columns):

image

with facFive as ordinal and as scale

JorisGoosen commented 6 months ago

This now works again in the emptyValuesRefactor from the https://github.com/jasp-stats/jasp-desktop/pull/5367 PR

JorisGoosen commented 6 months ago

Im also tweaking some stuff with computed columns and their types.

JUllenboom commented 5 months ago

Hi! Is there a bug with the rowMean Function? I'm getting only rowSum values, regardless of whether I use the rowMeans function or the rowSum Function..

tomtomme commented 5 months ago

@JUllenboom works for me on 0.18.3 and 0.19 beta (flatpak linux, cannot test on win or mac)

image

Do you have more details? A screenshot or a file?