Note: This standalone version relies on other components not included in this repository (on purpose). This also applies to the collected data.
In recent years, social media has become an increasingly popular way for people to connect with one another and share information. As a result, sentiment analysis of social media has become a valuable tool for predicting stock movements.
One platform that has been used extensively for sentiment analysis is Reddit. Reddit is a social news aggregator and discussion forum where users can submit, vote, and comment on content. Due to the large amount of user-generated content, Reddit has been found to be a valuable source for gauging public opinion.
In this repository, we will be using reinforcement learning to train two agents to predict the movement of stock prices based on Reddit sentiment. The agents are using a multimodal convolutional neural network.
Data related to stock symbols. The dataset (stock prices + scraped Reddit content) is not included.
The dataset is comprised of meta files and HDF5 files. The dataset handler merges the files accordingly to form a complete dataset.
Preprocessing steps (tasks.py) and pipeline (pipeline.py)
Reinforcement learning module (using stable-baselines3). This module is further divided into "portfolio", "simulation", and "stocks". Two reinforcement learning agents are trained. The "stocks" submodule trains an agent capable of trading individual stocks. The "portfolio" submodule uses the trained "stocks" agent to trade a portfolio of stocks. The "simulation" submodule is a semi-realistic test environment for the "portfolio" submodule.
Sentiment analysis steps (tasks.py) and pipeline (pipeline.py). The sentiment analysis is run before the preprocessing steps (see above).
root
data # Data related to stock symbols
dataset_handler # Dataset handler to handle meta files and HDF5 files
preprocessing # Preprocessing steps (tasks.py) and pipeline (pipeline.py)
rl # Reinforcement learning module (using stable-baselines3)
common # Shared functionality
portfolio # Portfolio trading environment + runner
simulation # Real market simulation (essentially a test environment)
stocks # Individual stock training environment + runner
sentiment_analysis # Sentiment analysis steps (tasks.py) and pipeline (pipeline.py)
tests
utils
GCP, PyTorch, Stable-Baselines3, Skicit-Learn, Weights & Biases, Ray, Optuna, nltk