ucsb-cs56-projects / cs56-utilities-GEscraper

-
1 stars 4 forks source link

cs56-utilities-GEscraper

Project Information

 YES | mastergberry | Scrape UCSB course pages for appripriate classes that fulfill certain GE requirements

Purpose


  1. Scrape and show General Education Area Course list.
  2. Scrape and show courses offered by a specific department in College of Engineering

The user has the option to choose to display courses from general subject or engineering.

The user can choose to show the course description of each course.

What's being scraped?


The example websites that are being scraped are: http://my.sa.ucsb.edu/catalog/current/UndergraduateEducation/AreaE.aspx (general education) https://my.sa.ucsb.edu/catalog/Current/CollegesDepartments/coe/compengr.aspx?DeptTab=Courses (engineering)

Who made what changed?


Changes by Dylan Lynch and Brent Kirkland

Recent Changes of LAST LAST GROUP (W15)

Recent Changes of LAST GROUP (W16)

Notes from W16

Changes by Xinjie You and Xingyuan Lin (F16)

Notes from F16


A major mistake we made and a valuable lesson we learned

We put a lot of efforts working on issue#21, which says "Expand the existing scraper to also scrape other UCSB services. Like the College of Engineering's GEAR book." However, unfortunately, we misunderstood the issue and did all the work to scrape courses that are offered by College of Engineering, no matther whether or not they belong to GE. Because we think it is part of "other UCSB services".

Actually, we should have noticed this because the name of the project is GEscraper!! But we are so focused on the code and development that we ignored this problem.

So, the lesson we learned is that we write code to meet customers' needs. There should be a good system to pass information about customers' needs down to software development. The communications and cooperations among different roles in a software team are very important.

Advice for future groups

F17 Final Remarks

This code is mainly set up in the EngGUI, ScraperGUI, and GeGUI. ScraperGUI contains the main method. The other two GUI files rely on the scraper files (GetEngInfo, GetGeInfo) to pull course information from external websites and display it to the user. We also implemented a mashup with UCSB Curriculum scraper. Currently, the UCSB Curriculum scraper files are in our repo. In the future, those files should be removed and instead the project should be used as a library. In the future, more details on enrollment history for past quarters could be added as well. Error handling could be implemented as well for when scraping fails (a course isn't found based on the filters entered). In general, make sure you understand what the purpose of each file is before trying to modify or use it. Make comments to document any changes you make.