Official implementation of the algorithm behind:
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
The main idea of this work is to simplify and streamline ArXiv paper reading. If you're a visual learner, this code will covert a paper to an engaging video format. If you are on the run and like listening, this code will also generate audio for listening.
Here are the main steps of the algorithm:
Download paper source code, given its ArXiv ID
Use latex2html
or latexmlc
to convert latex code to HTML page
Parse HTML page to extract text and equations, ignoring tables, figures, etc
If creating video, also create a map that matches pdf page to text and also text chunks to page blocks.
Split the text into sections and pass them through OpenAI GPT api to paraphrase, simplify and explain.
Split GPT-generated text into chunks and convert them to audio using text-to-speach Google api
Pack all the necessary pieces and create a zip file for further video processing
Using earlier computed text-block map, create video using ffmpeg
Note 1 The code can create both long, more detailed, as well as short, summarized versions of the paper.
Note 2 The long video version will also contain summary blocks after each section
Note 3 The short video version will contain automatically generated slides summarizing the paper
Note 4 The code can also upload the generated audio files to your Google Drive, if provided with proper credentials
openai, PyPDF2, spacy, tiktoken, pyperclip, google-cloud-texttospeech, pydrive2, pdflatex
# to create audio, both short and long, and prepare for video creation
python main.py --verbose --include_summary --create_short --create_video --openai_key <your_key> --paperid <arxiv_paper_id> --l2h
The default latex conversion tool latex2html
sometimes fails, in this case remove --l2h
to use latexmlc
. Also, by default the code will process the whole paper up to references, if you want to stop earlier, pass --stop_word "experiments"
(e.g., to stop before Experiments Section).
<arxiv_paper_id>_files/
├── final_audio.mp3
├── final_audio_short.mp3
├── abstract.txt
├── zipfile-<time_stamp>.zip
├── ...
├── extracted_orig_text_clean.txt
├── original_text_split_pages.txt
├── original_text_split_sections.txt
├── ...
├── gpt_text.txt
├── gpt_text_short.txt
├── gpt_verb_steps.txt
├── ...
├── slides
├── slide1.pdf
├── ...
The output directory, among other things, contains generated audio files, slides, extracted original text
and GPT generated output, split across pages or sections. The output also contains zipfile-<time_stamp>.zip
which includes data for video generation.
# to extract only the original text from ArXiv paper, without any GPT/audio/video processing
python main.py --verbose --extract_text_only --paperid <arxiv_paper_id>
Now, we are ready to generate the video:
# to generate video based on the results from above, point to the
python makevideo.py --paperid <arxiv_paper_id>
output_<time_stamp>/
├── output.mp4
├── output_short.mp4
├── ...
The output directory now contains two video files, one for the long and another for the short video.