Feature request: data weighting

JamesChrHarman commented 7 years ago

Environment (for enhancement requests)

* Enhancement: Function whereby a single variable in a dataset can be set as a "weight" giving additional weight to cases with a higher value and less emphasis on cases with a lower value. * Purpose: It would be great to include this function as many surveys suffer from skewed non-response by certain demographic groups and require weighting to be fully representative of their target populations.

AlexanderLyNL commented 7 years ago

Is this a data wide weighting or does it depend on the analysis?

JamesChrHarman commented 7 years ago

Probably just a simple data-wide weighting function like the one in SPSS.

AlexanderLyNL commented 7 years ago

I'm not sure what you have in mind exactly, but I think you can fix it by double clicking the data set, set, which allow you to edit it.

https://youtu.be/1dT-iAU9Zuc

Does this help as a work around?

JamesChrHarman commented 7 years ago

I know about data synchronisation and I've seen the video, but weighting isn't about modifying the data. It's about using one particular variable to make some cases influence the results of an analysis more than others. This is to correct for biases which over-represent certain demographic groups in the sample. It's a pretty routine operation to apply a case weight and SPSS has a straightforward function for it:

https://www.ibm.com/support/knowledgecenter/en/SSLVMB_24.0.0/spss/base/idh_weig.html

JASP is great but I can't use it to analyse survey data unless they can be weighted to be representative.

EJWagenmakers commented 7 years ago

This is an important request, and not the first one of its kind. We missed this because our expertise is not in survey data. We ought to add this feature, and it should not be too difficult. Let's give this some priority.

JamesChrHarman commented 7 years ago

Excellent, thank you!

ghost commented 6 years ago

I don't know if it's kosher to ask for an update, but since it's been quite a while, I'll risk it! I can't thank you enough for taking on this worthwhile project. I've been checking in for years, hoping for a day when I can switch over.

The issue is that as someone that works largely with weighted data, this is the one feature that prevents my organization from switching from SPSS to JASP. We can work around missing analyses until they're available, but we can't work around the lack of weighting -- all analyses require it.

So I know you must have a million features to work on, but any chance you have a sense of when this might be implemented? Thank you for your efforts!

HorwoodP commented 6 years ago

I would also love to have data weighting within JASP. All my stats work involves weighted survey data, so it would be very useful to have this functionality.

lithos commented 5 years ago

Hi this would be really useful for binomial test where you have summary data (ie categories and counts) like you do for contingency tables. Otherwise you have to expand out the data to have one row per participant

Cheers, keep up the awesome work

EJWagenmakers commented 4 years ago

This issue recently came up again (sorry, we dropped the ball on this). Just to clarify: we do offer this functionality for linear regression, through the "WLS Weights" function, right? (see screenshot). So the request is to implement this for other analyses as well? I'll also check out the SPSS video. Thanks.

EJWagenmakers commented 4 years ago

Hmm that SPSS html file is not very precise about the methodology that is used. Does there exist an R package? A quick Google search suggests the "survey" package (demo: https://cran.r-project.org/web/packages/pricesensitivitymeter/vignettes/using-weighted-data.html) and a book: Lumley, T (2010) Complex Surveys: A Guide to Analysis Using R

jrennstich commented 3 years ago

This is pretty standard for any kind of survey data, so I too would strongly appreciate such a feature in JASP, especially since I use it for teaching purposes!

One R package is https://github.com/andrie/surveydata in the example they use weights as a variable https://github.com/andrie/surveydata#defining-a-surveydata-object

One of the most widely used R packages for surveydata analysis is https://r-survey.r-forge.r-project.org/survey/ re weights see e.g., https://r-survey.r-forge.r-project.org/survey/example-svrepdesign1.html or https://r-survey.r-forge.r-project.org/survey/example-svrepdesign.html or a nice blog summary of the issue is here https://anthonybmasters.medium.com/survey-weights-in-r-a2346273e2cf or https://www.r-bloggers.com/2014/04/social-science-goes-r-weighted-survey-data/ or here https://rstudio-pubs-static.s3.amazonaws.com/268281_cc370bbbbbfb437b8650b22d208734d1.html

A step-by-step approach how to this in R is available here https://bookdown.org/jespasareig/Book_How_to_weight_a_survey/

Hope this helps? Really enjoyed the workshops two years ago. This is a must-have feature for us in the social sciences who deal with survey data, I am afraid, so it would be fantastic to have this implemented in JASP!

jrennstich commented 3 years ago

This issue recently came up again (sorry, we dropped the ball on this). Just to clarify: we do offer this functionality for linear regression, through the "WLS Weights" function, right? (see screenshot). So the request is to implement this for other analyses as well?

I'll also check out the SPSS video. Thanks.

There's an explanation explaining how it's done in SPSS here:

https://www.ibm.com/support/pages/differences-between-using-variable-weight-variable-spss-and-using-it-wls-or-regression-weight-regwgt-regression

Resolving The Problem In SPSS, the WEIGHT command is used as a case replication weight. If you have a weight of 2 for a case, that tells SPSS to treat that physical entry in the data file as representing two identical cases. When you use WEIGHT, unless you normalize your weights to have a mean of 1, the N you get reported from a statistical procedure will not match the number of physical cases, as it will be the sum of the weights.

The REGWGT or WLS weight in the REGRESSION procedure is a weight that is generally used to correct for unequal variability or precision in observations, with weights inversely proportional to the relative variability of the data points. The effects on the basic regression analysis of using a regression or WLS weight in REGRESSION are identical to those of using the same variable as a WEIGHT variable, except that the N is not altered

There is also an article that handles the math here:

http://www.asasrms.org/Proceedings/y2013/files/308377_80748.pdf

and a note from a practitioner's perspective:

https://blogs.worldbank.org/impactevaluations/tools-of-the-trade-when-to-use-those-sample-weights

boutinb commented 3 years ago

We could add this for 0.15, but I'm not sure about the amount of work. @vandenman Could you have a look?

vandenman commented 3 years ago

So there are requests for multiple analyses in this thread. I think that implementing all of these will take too long for 0.15 (at least, if I'm the only one implementing them). For linear regression, however. it should be possible to implement this for 0.15. In fact, I think all we need to do is adjust the degrees of freedom when a weights variable is passed (e.g., see this blog post). I'm not entirely sure what should happen for non-integer weights though, but I'll look into that when I get there. We should probably add an option that specifies whether the weights influence the degrees of freedom.

To implement this for all analyses it will probably take longer than 0.15. The survey package is indeed a good start. For descriptives, I suspect that it will a bit of work to implement (and possibly derive) weighted estimators (e.g., for skewness, kurtosis). Perhaps somebody else could take a look at that.

EJWagenmakers commented 3 years ago

Maybe @Kucharssim would like to assist with this functionality once he has completed his distributions...?!

KristofBostoen commented 2 years ago

I could not find the possibility to add sample weights to JASP. I was wondering if that is still scheduled for development. Most of my work relates to surveys and I would find it great if some of that functionality could be added to JASP.

EJWagenmakers commented 2 years ago

Yes, this is still scheduled!

StattMatt commented 2 years ago

Hello!

That's great. The importance of weighting cannot be underestimated. Social and economic researchers, students, demographers, medical professionals and government agencies use national and international panel data, such as those offered by institutes like the General Social Survey or the European Social Survey. They are often used for teaching purposes. All data sets always contain a weighting variable. One can only use statistical software with a weighting function for this. So far, there is only marginal free software that can do this. All of them are far behind the functional scope of the incredible JASP. Weighting pushes the door open to these users. Many students will switch from SPSS to JASP and later establish JASP on their workstations. The JASP user community will grow exponentially. This will also lead to more developers. In 10 years, JASP will be the new standard.

Thanks & Greetings

NickyTettamanti commented 2 years ago

Hi there, I agree with @StattMatt - there's an incredibly large audience who would benefit from adding various survey methods that allow the user to specify different characteristics of the sample design (e.g., sampling weights, clustering, stratification, and post-stratification).

@EJWagenmakers and @Kucharssim - I'm very happy to hear that it's still scheduled to be implemented. By providing a survey method feature, the project can reach a wide range of researchers who don't have the cash to purchase SPSS or STATA.

According to this article, SPSS has been the most dominant package in terms of how many articles on Google Scholar mentions it versus other statistical software packages - the author says "SPSS is by far the most dominant package, as it has been for over 20 years." I'd imagine even a rudimentary implementation of survey weighting could attract positive attention to the JASP project from a set of new users, who could potentially spur more development activity in this space over time.

Reaching out to Dr. Thomas Lumley - who developed the R survey package and wrote a book about survey data in R - might provide additional insight into how to implement something similar in JASP.

StattMatt commented 2 years ago

In my opinion, the most important strategic point: a new function usually expands the analysis possibility by 1. However, a weighting function expands the analysis possibilities for survey researchers not only by 1, but by all functions. I suspect there is currently no other function that can attract so many new users.

Manierrem commented 2 years ago

I strongly recommend the developers work towards implementing data weighting procedures. While there are many requests that all small details to analyses that extend their usefulness, this is one that is considered bare-minimum for doing survey research. It would dramatically open the pool of potential JASP users.

I would extend the suggestions that other are making, though, by pointing out that it isn't enough to just account for a weight single variable in many cases. Some datasets are calibrated to allow that, but frequently they use complex survey designs that need to be accounted for. These are often indicated with variables denoting things like a stratification number or a PSU on top of an individual level weight. R's built in SURVEY package handles these details, as does stata's Svyset suite of commands. Accounting for these common design features, which are used in most publicly available national data (e.g. ACS, NHIS, GSS, PSID, HRS) is really essential for getting researchers who work with nationally representative data on your side.

TarandeepKang commented 1 year ago

Dear team, I'm just wondering if there's been any progress on new features for the software for survey data analysis. I entirely agree with previous commenters who say that these new features are incredibly important for people to work with nationally representative data sets. The package many of us use for analysing survey data with complex designs, and for raking/propensity score matching amongst other details is the "survey" package written by Thomas Lumley:

http://cran.fhcrc.org/web/packages/survey/index.html

EJWagenmakers commented 1 year ago

Thanks for reminding us. We had other fish to fry for some time. I'll see whether I can find someone to pick this up.

juliuspfadt commented 1 year ago

duplicate of https://github.com/jasp-stats/jasp-issues/issues/73

tomtomme commented 10 months ago

Duplicate. Can be closed.

jasp-stats / jasp-issues

Feature request: data weighting #395

Environment (for enhancement requests)