Closed samjett247 closed 4 years ago
Good for review now? @samjett247
Good for review now? @samjett247
I'd say no @schuermannator . I found a bug in the way I was getting the question ratings and I need a little more time to go over it. If you've got time to work on something, try to setup stuff with GraphQL on backend. I will have this scraping/parsing done, the data aggregated, and the site running on the more recent reviews by the end of the coming long weekend.
Update: I updated Readme and added some more documentation. Moving to backend to adjust scraper and
Good for review now? @samjett247
I'd say no @schuermannator . I found a bug in the way I was getting the question ratings and I need a little more time to go over it. If you've got time to work on something, try to setup stuff with GraphQL on backend. I will have this scraping/parsing done, the data aggregated, and the site running on the more recent reviews by the end of the coming long weekend.
Good to go @schuermannator
See README for stats Re. Scraping efficiency.
Hey Zach and Joe, I had a little extra time this past few weeks and I wanted to look into the scraping/ocr process a little further. I was trying to learn more about how Joe wrote it and to get more experience with both working with an existing codebase and how to write a scraper and parser. Along the way, I figured out and implemented a few optimizations:
I kept a lot of the macrostructure of the program, especially the organization of pdf files and the pdf splitter, the web crawler framework, the error-handling during parsing, the multiprocessing for parsing, and the upload to Mongo. I put the data from the run, including all colleges for years 2010-2019, into ocr-db-v1 in Mongo.
Planning to phase this dataset into use on the backend over the next week; Plan to run into a few kinks here but hopefully can get this bigger dataset into use in our app. I've got a few more changes I want to make - modifying README and adding more documentation - but wanted to get you guys review here.
Thanks SJ