Closed katehausladen closed 5 months ago
The crawl is finished! GPP implementation nearly doubled since December!
I reopened the issue to merge the code used for this crawl. Since I crawled the whole crawl set with these changes, I went ahead and just merged the changes. The changes were (1) cap the debugging table entries at 4,000 characters, since that is what our table allows (2) add another human check regular expression and (3) update the readme to reflect wellknown changes.
here's the updated analysis flow / architecture powerpoint web-crawler-architecture.pptx
I talked to Daniel, and this week is the best week for me to have the computer. I started the crawl last night.