stev-ou / review_ocr

Using Google's Tesseract OCR to extract data from public PDFs
GNU General Public License v3.0
0 stars 1 forks source link

Change Regex pattern for scraping certain questions #7

Open samjett247 opened 4 years ago

samjett247 commented 4 years ago

Current regex pattern stops a question scrape when non-alphanumeric character is encountered. Small change; should be implemented before next scraper rerun.

Actual Question Scraped Question
The instructor related course material to professional practice and/or research The instructor related course material to professional practice and
The instructor was well-organized and made adequate preparation for class The instructor was well