rpgauthier / ComputationalThematicAnalysisToolkit

MIT License
13 stars 6 forks source link

Computational Thematic Analysis Toolkit

IMPORTANT

May 1st 2023 we learned that Redit has informed it's moderator community that pushshift.io was in violation with Reddit API's new terms and has had it's access revoked. https://www.reddit.com/r/modnews/comments/134tjpe/reddit_data_api_update_changes_to_pushshift_access/

As such Pushshift.io functionality in the toolkit should be used with caution and careful consideration. The source code will have a warning regarding this situation added to the data collection functionality that was leveraging the Pushshift API.

Reference

Robert P. Gauthier and James R. Wallace. 2022. The Computational Thematic Analysis Toolkit. Proc. ACM Hum.-Comput. Interact. 6, GROUP, Article 25 (January 2022), 15 pages. https://doi.org/10.1145/3492844

Installation Instructions

To Access most recent version: https://github.com/rpgauthier/ComputationalThematicAnalysisToolkit/releases/latest

Installers available for Windows 10 x64 and OSX

Toolkit Functionality:

The Toolkit is made up of interconnected modules.

Data Collection

Is used by the researcher to import data into the toolkit. Once imported the module visualizes the data's content so that the user can interact with the data at scale and become more familiar with the data and begin forming ideas about for their analysis.

Data Cleaning & Filtering

Provide the researcher the ability to: (1) see what rules are being used to include and remove words by the toolkit's internal application of computational techniques; (2) review which words are included and removed by the rules; and (3) to tune the rules to search for signals. During this process researchers can become more familiar with general dataset by seeing how different words are used by clicking on any word in the included or removed list.

Modelling & Sampling

Provides the ability for researcher to create a variety of purposive samples, using iterative topic models the seek to group data based on signals such as common word groupings in the comments, to provide a diverse set of models that capture samples of different sets of data. The researcher can use these samples to help them both further familiarize with the data as well as continue forming their inductive analytical framework.

Coding

Provides the researcher with a place where data can be coded and reviewed in an iterative manner to develop, refine, and apply their analytical framework to sampled data in the form of a concrete set of codes.

Reviewing

Provides the researcher a place to create themes, group codes within the themes and visualize connections between codes and themes.

Reporting

Provides an interface to help the researcher choose quotes and keep track of which piece of data they came from for each code and theme and, if desired for ethical reasons, keep track of paraphrasing of these quotations to enable review with the research team about whether the paraphrase captured the original quotation properly.

To Modify or Build a New version

Download or Fork repository Open src folder in an IDE (tested in VS Code on Windows and OSX)

Build Commands

Windows:

1) pyinstaller pyinstaller-Windows10x64.spec --additional-hooks-dir=. 2) run & compile innosetup_Windows10x64.iss

OSX running an intel chip:

1) change paths in pyinstaller-OSX.spec to where your python site-packages are installed 2) python -m PyInstaller --windowed --additional-hooks-dir=. pyinstaller-OSX.spec 3) run & build packages_OSX_x86_64.pkgproj

OSX running an M1 chip:

1) change paths in pyinstaller-OSX.spec to where your python site-packages are installed 2) python -m PyInstaller --windowed --additional-hooks-dir=. pyinstaller-OSX.spec 3) run & build packages_OSX_arm64.pkgproj

Needed applications

Needed Packages (there may be others)

Additional Steps