Welcome to the Bayesian beagle blog! This project is a unique intersection of machine learning and scientific communication, providing a platform where readers can quickly get insights from the latest research papers hosted on ArXiv. Utilizing state-of-the-art Large Language Models (LLMs), our system generates concise, comprehensible summaries of complex research articles, covering a wide array of disciplines.
Our blog is built using Quarto, an open-source scientific and technical publishing system designed for creating beautiful, data-driven content. It is then published with Netlify.
graph LR
A["Download weekly Arxiv articles"] --> B["Predict and Filter LLM topic"]
B --> C["Summarize short docs"]
B --> D["Summarize by Map-Reduce long docs"]
C --> E["Update website with summaries weekly"]
D --> E
The blog is live at https://bayesian-beagle.netlify.app/
Navigate to the blog using the provided link and enjoy the latest research summaries. If you're interested in how the blog is generated or want to suggest improvements, feel free to check the repository or open an issue.
To clone and run this project locally, you'll need Git, Quarto, and the necessary Python packages installed on your computer. From your command line:
# Clone this repository
git clone https://github.com/wesslen/bayesian-beagle.git
# Go into the repository
cd bayesian-beagle
# Create venv
python3.9 -m venv venv
source venv/bin/activate
# Install dependencies for summary
pip install -r requirements-summarizer.txt
# Install dependencies for build
pip install -r requirements-build.txt
# Install dependencies for langchain
pip install -r requirements-langchain.txt
# Curate arxiv ids in data/input.jsonl, ensure they have HTML renderings
# Generate summaries
python scripts/summarizer.py data/input.jsonl
# Create quarto posts of summaries
python scripts/generate_qmd.py data/output.jsonl posts
# Build the Quarto blog
quarto render
Distributed under the MIT License. See LICENSE
for more information.
strip-tags
library