Closed stumash closed 7 years ago
Well, the script doesn't output anything new, in the sense that stdout is the same as it was. However, with respect to the course-info json it creates, there are some slight differeneces.
There were some courses whose entire info was being absorbed by the description
of the previous course. This was fixed by the first change. So I guess now there are a few courses that are are now being scraped as separate courses where before all their info was part of some other course's description
.
Also, the description
property of some courses was also including the NOTE/Lecture/Tutotorial/Laboratory information which was fixed by the second change.
2 changes to regexes
First Change
The
course.info.header.rgx
was:[A-Z]{4} [0-9]{3}[[:space:]]+?[A-Z][a-z]+.
, but is now:[A-Z]{4} [0-9]{3}[[:space:]]+?(\\(also listed as [^)]*\\))?[A-Z][a-z]+.
We use this regex to identify the start of a single course's information. Some course's information starts with something like:
COMP 101 (also listed as SOEN 101) Intro. to Programming
instead of:COMP 101 Intro. to Programming
.Second change
The
course.description.rgx
was:.*?(?=(Lecture|Tutorial|Laboratory|\nNOTE|$))
, but is now:(.*?)(Lecture|Tutorial|Laboratory|NOTE|$)
The previous regex was essentially broken and was trying to achieve the result of the new one. The new regex will match the entire string up until the first occurence of either
Lecture
,Tutorial
,Laboratory
,NOTE
, or$
(end of string).resolves #84