This project aims to develop a model capable of extracting claims from solar energy-related news articles. We will either create a novel model from scratch or fine-tune an existing model to achieve this goal. The project will involve the creation of a specialized dataset, establishing baseline performance, and implementing an iterative active learning approach to continuously improve the model's accuracy.
What is a Claim?
A statement that can be verified as true or false
An assertion about the world that expresses a belief or opinion
An arguable proposition
Dataset Construction and Baseline Establishment
Dataset Creation: construct a new, specialized dataset on solar energy news articles.
Test Set Development: A portion of the dataset will be manually annotated and verified to create a test set.
Baseline Performance: Existing general claim extraction models will be evaluated on this dataset to establish baseline performance metrics.
Approach to Claim Extraction
Machine Learning Classification: Train models to classify individual sentences as claims or non-claims
Sequence Labeling: Implement models that tag individual spans of text as components of claims
Rule-Based: Too trivial to consider
Iterative Active Learning
Initial Training: Train the model on a small, high-quality subset of manually annotated data.
Automated Labeling: Use the trained model to label claims in the remaining dataset, assigning confidence scores to each prediction.
Manual Annotation: Focus human annotation efforts on instances where the model exhibits low confidence.
Iterative Improvement: Incorporate newly annotated data into the training set and retrain the model.
Performance Evaluation: Assess model improvement after each iteration using the held-out test set.
Claim Extraction in Solar Energy News Articles
Team
Category
New Research Problem
Problem Statement
This project aims to develop a model capable of extracting claims from solar energy-related news articles. We will either create a novel model from scratch or fine-tune an existing model to achieve this goal. The project will involve the creation of a specialized dataset, establishing baseline performance, and implementing an iterative active learning approach to continuously improve the model's accuracy.
What is a Claim?
Dataset Construction and Baseline Establishment
Approach to Claim Extraction
Iterative Active Learning
Evaluation Strategy
Dataset
News articles (~110k) scrapped from the web.
Resources