Open natestemen opened 5 years ago
In hopes of not having to scrape data from pdf's, I've emailed the authors of all the books to see if they will provide the source code from their project.
in general:
seem to be pretty good.
Found the latex source code for the open source Trench real analysis textbook. TRENCH_REAL_ANALYSIS (1).zip
import re
def theoremGetter():
theoremLi = []
fh = open("TRENCH_REAL_ANALYSIS.tex.txt", "r")
textString = fh.read()
fh.close()
theorems = re.findall(r'begin{theorem}(.*?)\\end{theorem}',textString, re.S)
return theorems
theorems = theoremGetter()
fh = open("realTheorems.txt", "w")
count = 1
for i in theorems:
fh.write("Theorem " + str(count) + ":\n")
fh.write(i)
fh.write("\n")
count +=1
fh.close()
The one above actually compiles in Latex: https://www.overleaf.com/project/5c6caec3a2b08f6c9c112121
we need some open source textbook we can start to scrape theorems/definitions from.
how do we feel about scraping from proofwiki also?