Closed pdpinch closed 1 year ago
For this course, I looked into the parsed json and the text
for instructor insight page is empty .
Screen shot attached
That's frustrating. Have you looked at the raw json?
I'm moving this issue to ocw-data-parser.
That's frustrating. Have you looked at the raw json?
I'm moving this issue to ocw-data-parser.
if by raw json you mean the uid_master.json then yes and unfortunately its null there as well
When I say the raw json, I mean the json that is exported from Plone.
For this course, it would be in the bucket s3://ocw-content-storage/PROD/11/11.127/Spring_2015/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/0
@mbertrand or @gumaerc can you confirm that I have the correct bucket for the raw JSON from plone?
s3://ocw-content-storage/PROD/
Yes, that is the right bucket
@Wassaf-Shahzad The s3 bucket that @pdpinch is referring to is where data is directly dumped out of the old Plone system, ocw-content-storage
. Due to the way this has to be done, the data ends up being very disorganized and is not super easy to write a program to parse. Plone exports each object in the course as a separate numbered JSON file with a lot of metadata properties that are unnecessary as well as actual PDFs and images base64 encoded as text and shoved into the JSON data.
Hence, ocw-data-parser
was created to crawl these raw exports and extract the assets as actual files as well as organize all the metadata and HTML from the pages into one JSON file, which is what's referred to as the "parsed json." Honestly, a better way to refer to these might be "stage 1" and "stage 2" conversions.
I was looking for the original issue that I made regarding this and couldn't find it. I came across this back in the early days of the project while we were first developing the theme in hugo-course-publisher
. Basically, there were 2 different ways that Instructor Insights pages were built. In most of them, the HTML is just in the text
property in the original Plone JSON. In some of them though, the content is broken up into a bunch of different fields. For example, howstudenttimewasspenttext
, theclassroomtext
, etc. @pdpinch knows all about this. Ironically though, for this course, it seems that all of the content is stored in howstudenttimewasspenttext
.
@gumaerc So If getting this right, The content is stored in fields which ocw-data-parser ignores making the text field empty ? Also can you give me the course numbers for the above mentioned courses I would like to run them locally.
The course numbers are in the description. Here are links to the raw JSON:
11.127j s3://ocw-content-storage/PROD/11/11.127/Spring_2015/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/0/
6.811 s3://ocw-content-storage/PROD/6/6.811/Fall_2014/6-811-principles-and-practice-of-assistive-technology-fall-2014/0/
The course numbers are in the description. Here are links to the raw JSON:
11.127j s3://ocw-content-storage/PROD/11/11.127/Spring_2015/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/0/
6.811 s3://ocw-content-storage/PROD/6/6.811/Fall_2014/6-811-principles-and-practice-of-assistive-technology-fall-2014/0/
@pdpinch @gumaerc I have debugged both courses and surprisingly for both of them, ocw-data-parser was able to populate the text field of instructors insights even if they were divided into separate subsections.The following PR handles the subsection case https://github.com/mitodl/ocw-data-parser/pull/119
Screenshots
@Wassaf-Shahzad do you know how we can import these two pages (not full courses) into ocw-studio?
(If there isn't a good way to do this, I be willing to do a copy and paste, since it's just two pages)
@Wassaf-Shahzad do you know how we can import these two pages (not full courses) into ocw-studio?
(If there isn't a good way to do this, I be willing to do a copy and paste, since it's just two pages)
@pdpinch Yes and funny you say that cause I did resolve this issue on RC by copy pasting the relevant markdown as explained by Carey here
I'm pretty sure this has been fixed in production:
11.127j Instructor Insights page didn't import
legacy: https://ocw.mit.edu/courses/urban-studies-and-planning/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/instructor-insights/ nextgen: https://ocwnext.odl.mit.edu/courses/11-127j-computer-games-and-simulations-for-education-and-exploration-spring-2015/pages/instructor-insights/ github: https://github.mit.edu/mitocwcontent/11.127j-spring-2015/blob/main/content/pages/instructor-insights/_index.md
6.811 instructors insights page
legacy: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-811-principles-and-practice-of-assistive-technology-fall-2014/instructor-insights/ nextgen: https://ocwnext.odl.mit.edu/courses/6-811-principles-and-practice-of-assistive-technology-fall-2014/pages/instructor-insights/ github: https://github.mit.edu/mitocwcontent/6.811-fall-2014/blob/main/content/pages/instructor-insights/_index.md parsed json: s3://open-learning-course-data-production/6-811-principles-and-practice-of-assistive-technology-fall-2014/ raw json: s3://ocw-content-storage/PROD/6/6.811/Fall_2014/6-811-principles-and-practice-of-assistive-technology-fall-2014/0