I noticed that the script currently scrapes the CG for Fall 2018 without a check for the most recent/current term. This should solve that issue by adding a function to modify the URL to point to the current term.
Upon inspection of the CG, I found that each term is given an ID-like string. This "ID" is passed as a query when retrieving a certain webpage in the CG.
Fall 2018's ID is f_18_2210
Fall 2019's ID is f_19_2260
So,
A sample URL for the Fall 2018 CG is https://www.lsa.umich.edu/cg/cg_subjectlist.aspx?termArray=f_18_2210&cgtype=ug&allsections=true
A sample URL for the Fall 2019 CG is https://www.lsa.umich.edu/cg/cg_subjectlist.aspx?termArray=f_19_2260&cgtype=ug&allsections=true
I'm not sure about how exactly this ID is computed else I would have taken that route. Instead, I found that some buttons on the main CG page have attributes that contain this ID.
I used bs4 to parse the main CG page and find the element containing the term ID. Then, using some regex matching, the term ID is retrieved and passed to the function that constructs the URL to be parsed.
I modified the URL-constructing function to accommodate for this change.
All-in-all, this should future-proof your script. :)
I noticed that the script currently scrapes the CG for Fall 2018 without a check for the most recent/current term. This should solve that issue by adding a function to modify the URL to point to the current term.
Upon inspection of the CG, I found that each term is given an ID-like string. This "ID" is passed as a query when retrieving a certain webpage in the CG.
f_18_2210
f_19_2260
So,
https://www.lsa.umich.edu/cg/cg_subjectlist.aspx?termArray=
f_18_2210
&cgtype=ug&allsections=true
https://www.lsa.umich.edu/cg/cg_subjectlist.aspx?termArray=
f_19_2260
&cgtype=ug&allsections=true
I'm not sure about how exactly this ID is computed else I would have taken that route. Instead, I found that some buttons on the main CG page have attributes that contain this ID.
I used bs4 to parse the main CG page and find the element containing the term ID. Then, using some regex matching, the term ID is retrieved and passed to the function that constructs the URL to be parsed.
I modified the URL-constructing function to accommodate for this change.
All-in-all, this should future-proof your script. :)