Closed weallwegot closed 7 years ago
you can also use lxml
commit does gt omscs web pages for the classes
47fb903c365b2e27c9150a239b2dcde9a8ea2476
leaving open, so implementation of the non-specific class related shit can be in there too
also class related shits are weird when the site has things listed in non-paragraph tags. like lists following colons doesnt go over well
fixed the bad page parsing by making a while loop that stops when the next h4 element is reached. might need to add some stop limit of like 10 iterations in case things get weird or answers get too long. or they stop using h4 elements lol. 0dac9742c7dc66e29132efacc90f55805202598f
http://lxml.de/api/lxml.etree._Element-class.html because this documentation is so hard to find
From @weAllWeGot on April 1, 2017 14:50
how?
beautiful soup 4 and html parsing of some of the more important pages. https://www.crummy.com/software/BeautifulSoup/bs4/doc/
this can be used for the following info:
course specializations https://www.omscs.gatech.edu/program-info/specializations
admissions/application questions https://www.omscs.gatech.edu/program-info/admission-criteria https://www.omscs.gatech.edu/program-info/application-deadlines-process-requirements
program costs questions and financial aid https://www.omscs.gatech.edu/program-info/cost-payment-schedule
faqs https://www.omscs.gatech.edu/prospective-students/faq
_Copied from original issue: weAllWeGot/kbai_chatbot3#58