Closed pdpinch closed 3 years ago
I find the data structure in the raw JSON / Plone almost impenetrable. If you want to refactor it into something easier to follow, it would make me happy.
I don't think we can do much in ocw-data-parser to refactor this since we treat courses independently. But we can do that in ocw-to-hugo when we produce the other versions text
Does the necessary data show up in the processed JSON so that we can use it ocw-to-hugo?
Not yet but I'm planning on a PR for that. It's just going to pass through dspace_handle
, features_tracking
, and the first item of is_update_of
. I'll make another PR to handle it in ocw-to-hugo
@pdpinch Do we know how this should show up in the UI yet?
Is dspace_handle
just an indicator that a course is archived or is there a value that should be parsed from either of the urls?
Do we know how this should show up in the UI yet?
You can see it on https://projects.invisionapp.com/share/QFZ8KA9SH2P#/screens/435953154_OCW_Course_Home_Page_Color_10-27-2020_Course_Info_Collapsed_V2
I hadn't made an issue for ocw-hugo yet because it was waiting for this one to be closed.
The 18.01sc on "Other courses" has "scholar" at the end of the name but not for archived courses. Should "scholar" appear on both? And in general should the text be identical if the course is the same?
The example in InVision isn't based on real data.
The 18.01sc on "Other courses" has "scholar" at the end of the name but not for archived courses. Should "scholar" appear on both?
In practice, no scholar course has ever been archived. I wouldn't worry about it.
And in general should the text be identical if the course is the same?
I'm not sure which text you mean, but if a course has been archived, it should only appear in the "Archived OCW Versions" section.
Put another way, the links under "Archived OCW Versions" should only go to hdl.handle.net URLs
Is
dspace_handle
just an indicator that a course is archived or is there a value that should be parsed from either of the urls?
The value for dspace_handle
is a unique "handle" identifier. It can be parsed to generate the more useful handle.net URL. I would suggest preserving both in the parsed JSON unless you're confident that they are both always present. (Since these rely on data entry in the legacy CMS, I don't know if they have been input consistently)
The ocw_feature_url
should be used as the HREF for links in the "archived courses" section, same as on https://ocw.mit.edu/courses/mathematics/18-01-single-variable-calculus-fall-2005/
I'm going to close this since https://github.com/mitodl/ocw-to-hugo/issues/274 should handle all remaining work
@pdpinch I did some research yesterday and this morning and I don't think features_tracking
is reliable to use with is_update_of
to create dspace links with course titles. There are some courses which have multiple previous versions in features_tracking
, and some which have multiple items in is_update_of
. I am not sure we can say for sure that what is described in is_update_of
and features_tracking
matches up exactly even if there is only 1 previous version and 1 uid. Instead I think we should only use is_update_of
and dspace_handle
and just leave out the link if the two don't match up. What do you think?
There are only a small number of dspace links in features_tracking
which don't appear in dspace_handle
somewhere:
New hdl: 1721.1/120335 21a-120-american-dream-using-storytelling-to-explore-social-class-in-the-united-states-spring-2018
New hdl: 1721.1/121500 6-057-introduction-to-matlab-january-iap-2019
New hdl: 1721.1/121170 6-436j-fundamentals-of-probability-fall-2018
New hdl: 1721.1/75824 6-005-elements-of-software-construction-fall-2011
New hdl: 1721.1/121185 1-258j-public-transportation-systems-spring-2017
New hdl: 1721.1/120336 5-61-physical-chemistry-fall-2017
New hdl: 1721.1/121583 14-381-statistical-method-in-economics-fall-2006
New hdl: 1721.1/121583 14-381-statistical-method-in-economics-fall-2018
New hdl: 1721.1/120951 21g-103-chinese-iii-regular-fall-2018
New hdl: 1721.1/120952 21g-103-chinese-iii-regular-fall-2018
There are a decent number of course references from is_update_of
which don't have a dspace_handle
, about 142 from my script
part of #53
When a course is archived some additional metadata is added to the course, and also to its successor. This metadata allows us to build the "Archived OCW Versions" list on a CHP.
For the archived course, a
dspace_handle
is added. For example, in s3://ocw-content-storage/PROD/18/18.01/Fall_2003/18-01-single-variable-calculus-fall-2003/0/1.jsonfor the successor course, a feature is added to
features_tracking
. In s3://ocw-content-storage/PROD/18/18.01/Fall_2005/18-01-single-variable-calculus-fall-2005/0/1.json for example:and a field is added,
is_update_of
(not sure why this is a list):The combination of the
features_tracking
(with the handle.net URL) and theis_update_of
should be sufficient to construct the necessary link URL and link text for the successor course CHP.