mitodl / ocw-data-parser

A parsing script for MIT OpenCourseWare course data
0 stars 0 forks source link

2 Instructor Insights page didn't import #177

Closed pdpinch closed 1 year ago

pdpinch commented 2 years ago

11.127j Instructor Insights page didn't import

legacy: https://ocw.mit.edu/courses/urban-studies-and-planning/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/instructor-insights/ nextgen: https://ocwnext.odl.mit.edu/courses/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/pages/instructor-insights/ github: https://github.mit.edu/mitocwcontent/11.127j-spring-2015/blob/main/content/pages/instructor-insights/_index.md

6.811 instructors insights page

legacy: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-811-principles-and-practice-of-assistive-technology-fall-2014/instructor-insights/ nextgen: https://ocwnext.odl.mit.edu/courses/6-811-principles-and-practice-of-assistive-technology-fall-2014/pages/instructor-insights/ github: https://github.mit.edu/mitocwcontent/6.811-fall-2014/blob/main/content/pages/instructor-insights/_index.md parsed json: s3://open-learning-course-data-production/6-811-principles-and-practice-of-assistive-technology-fall-2014/ raw json: s3://ocw-content-storage/PROD/6/6.811/Fall_2014/6-811-principles-and-practice-of-assistive-technology-fall-2014/0

Wassaf-Shahzad commented 2 years ago

For this course, I looked into the parsed json and the text for instructor insight page is empty . Screen shot attached

Screenshot 2022-02-25 at 5 46 56 PM
pdpinch commented 2 years ago

That's frustrating. Have you looked at the raw json?

I'm moving this issue to ocw-data-parser.

Wassaf-Shahzad commented 2 years ago

That's frustrating. Have you looked at the raw json?

I'm moving this issue to ocw-data-parser.

if by raw json you mean the uid_master.json then yes and unfortunately its null there as well

pdpinch commented 2 years ago

When I say the raw json, I mean the json that is exported from Plone.

For this course, it would be in the bucket s3://ocw-content-storage/PROD/11/11.127/Spring_2015/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/0

pdpinch commented 2 years ago

@mbertrand or @gumaerc can you confirm that I have the correct bucket for the raw JSON from plone?

s3://ocw-content-storage/PROD/

mbertrand commented 2 years ago

Yes, that is the right bucket

gumaerc commented 2 years ago

@Wassaf-Shahzad The s3 bucket that @pdpinch is referring to is where data is directly dumped out of the old Plone system, ocw-content-storage. Due to the way this has to be done, the data ends up being very disorganized and is not super easy to write a program to parse. Plone exports each object in the course as a separate numbered JSON file with a lot of metadata properties that are unnecessary as well as actual PDFs and images base64 encoded as text and shoved into the JSON data. Hence, ocw-data-parser was created to crawl these raw exports and extract the assets as actual files as well as organize all the metadata and HTML from the pages into one JSON file, which is what's referred to as the "parsed json." Honestly, a better way to refer to these might be "stage 1" and "stage 2" conversions.

I was looking for the original issue that I made regarding this and couldn't find it. I came across this back in the early days of the project while we were first developing the theme in hugo-course-publisher. Basically, there were 2 different ways that Instructor Insights pages were built. In most of them, the HTML is just in the text property in the original Plone JSON. In some of them though, the content is broken up into a bunch of different fields. For example, howstudenttimewasspenttext, theclassroomtext, etc. @pdpinch knows all about this. Ironically though, for this course, it seems that all of the content is stored in howstudenttimewasspenttext.

Wassaf-Shahzad commented 2 years ago

@gumaerc So If getting this right, The content is stored in fields which ocw-data-parser ignores making the text field empty ? Also can you give me the course numbers for the above mentioned courses I would like to run them locally.

pdpinch commented 2 years ago

The course numbers are in the description. Here are links to the raw JSON:

11.127j s3://ocw-content-storage/PROD/11/11.127/Spring_2015/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/0/

6.811 s3://ocw-content-storage/PROD/6/6.811/Fall_2014/6-811-principles-and-practice-of-assistive-technology-fall-2014/0/

Wassaf-Shahzad commented 2 years ago

The course numbers are in the description. Here are links to the raw JSON:

11.127j s3://ocw-content-storage/PROD/11/11.127/Spring_2015/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/0/

6.811 s3://ocw-content-storage/PROD/6/6.811/Fall_2014/6-811-principles-and-practice-of-assistive-technology-fall-2014/0/

@pdpinch @gumaerc I have debugged both courses and surprisingly for both of them, ocw-data-parser was able to populate the text field of instructors insights even if they were divided into separate subsections.The following PR handles the subsection case https://github.com/mitodl/ocw-data-parser/pull/119

Screenshots

Screenshot 2022-03-09 at 1 35 12 PM Screenshot 2022-03-09 at 1 39 47 PM
pdpinch commented 2 years ago

@Wassaf-Shahzad do you know how we can import these two pages (not full courses) into ocw-studio?

(If there isn't a good way to do this, I be willing to do a copy and paste, since it's just two pages)

Wassaf-Shahzad commented 2 years ago

@Wassaf-Shahzad do you know how we can import these two pages (not full courses) into ocw-studio?

(If there isn't a good way to do this, I be willing to do a copy and paste, since it's just two pages)

@pdpinch Yes and funny you say that cause I did resolve this issue on RC by copy pasting the relevant markdown as explained by Carey here

pdpinch commented 1 year ago

I'm pretty sure this has been fixed in production:

https://ocw.mit.edu/courses/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/pages/instructor-insights/

https://ocw.mit.edu/courses/6-811-principles-and-practice-of-assistive-technology-fall-2014/pages/instructor-insights/