mitodl / ocw-to-hugo

A command line utility for taking master.json output from ocw-data-parser and producing markdown for use with hugo-course-publisher
3 stars 0 forks source link

some pages in 7.00sc Fundamentals of Biology don't build correctly #473

Closed pdpinch closed 2 years ago

pdpinch commented 2 years ago

Steps to Reproduce

view https://ocw-published.odl.mit.edu/courses/7-01sc-fundamentals-of-biology-fall-2011/pages/biochemistry/macromolecules-lipids-carbohydrates-nucleic-acid/

Expected Behavior

should have all the content from https://ocw.mit.edu/courses/biology/7-01sc-fundamentals-of-biology-fall-2011/biochemistry/macromolecules-lipids-carbohydrates-nucleic-acid/

Screenshot or Screencast

image

Related issues

The content loads in studio, but the page doesn't build. This page contains a "check yourself" javascript, so work done in mitodl/ocw-hugo-themes#215 might be relevant.

MAbdurrehman12 commented 2 years ago

@pdpinch This PR is created to fix the blank section, but once it is fixed, content is rendered in the section as shown in the attached screenshots in PR, but there is another issue:

There are some script tags in the content of the course which tries to load some js files, something like this:

<script type="text/javascript" src="/scripts/jquery-1.3.2.min.js"></script> 
<script type="text/javascript" src="/scripts/jQuizMe-uncompressed.js"></script> 

Since these JS files are not found so 404 is thrown in return: image which breaks the sections and does not render the "check yourself" javascript

It feels like these JS files should be there in the content so they can be loaded or if their count is limited then maybe we can place them in ocw-theme but I'm not sure yet so currently looking into this issue. @abeglova Do you have some thoughts or suggestions on this?

abeglova commented 2 years ago

So that page has raw html because ocw-to-hugo is failing when trying to convert it to hugo

MAbdurrehman12 commented 2 years ago

@abeglova When the course is successfully generated/built with the raw HTML, does it successfully render in ocw-theme?

abeglova commented 2 years ago

Yes - other similar pages get converted to hugo just fine - I think theres something wrong with this course specifically

abeglova commented 2 years ago

I looked into this more and the issue is the turndown for multiple choice questions https://github.com/mitodl/ocw-to-hugo/blob/master/src/turndown.js#L520 breaking and throwing an error for the multiple choice questions for the course, preventing the pages from being converted to hugo

abeglova commented 2 years ago

Multiple choice questions are working fine for other courses so we can dig into why it's a problem for this course (hypothesis: possibly related to having multiple multiple choice widgets on the same page). Or we can just ask the ocw team to remove the multiple choice questions from this course

MAbdurrehman12 commented 2 years ago

Thanks @abeglova for looking into this. I'll try to dig more

alicewriteswrongs commented 2 years ago

Is it possible that this is a data issue which we should fix by editing this page in particular, rather than something we should fix via a code change? I'm wary of allowing arbitrary HTML in Markdown (the change in https://github.com/mitodl/ocw-hugo-projects/pull/124), we have considered turning that setting on in the past but ultimately decided not to, since it is non-specific (i.e. it enables all html in markdown, not just particular tags --- there's a reason why it's called unsafe).

MAbdurrehman12 commented 2 years ago

@abeglova I'm able to successfully convert this course to hugo via ocw-to-hugo but after conversion, some pages have raw HTML which doesn't render. How and when did you encounter this issue?

breaking and throwing an error for the multiple choice questions for the course, preventing the pages from being converted to hugo

gumaerc commented 2 years ago

@abeglova I dug into this this morning and found that the issue was happening in the multiple_choice_questions_widget turndown rule. There were a number of times parsing the content of dataSubstring as JSON failed. Here's an example:

{ "multiList": [ { "ques": 'Which of the following statements is true for phospholipid molecules?<ol type="a">
  <li>The polar end of phospholipids would contain carbon and phosphorous and oxygen.</li>
  <li>The non-polar end of phospholipids would contain almost exclusively carbon and hydrogen.</li>
  <li>The polar end of phospholipids would form hydrogen bonds with water.</li>
  <li>The non-polar end of phospholipids associate with the cytoplasm of the cell.</li>
</ol>', "ans": "a,b,c", "ansSel": ["a,b,c, and d", "b,c,d", "a,c"], "ansInfo": "" }, { "ques": 'Which of the following
statements is true for carbohydrate molecules? <ol type="a">
  <li>The general structure can abbreviated as (CH<sub>2</sub>O)n.</li>
  <li>A disaccharide can be formed by a condensation reaction between two glucose molecules.</li>
  <li>Carbohydrates can be used as an energy source for cells.</li>
  <li>Carbohydrates can be used as a structural molecule.</li>
</ol>', "ans": "a,b,c, and d", "ansSel": ["a,b", "c,d", "b,c"], "ansInfo": "" }] }

It seems that this is an issue with shoving HTML into a JSON property and quotations not being escaped prior to that. I would agree that this page should be re-authored, but we don't have an interface within ocw-studio for authoring quizzes yet do we?

abeglova commented 2 years ago

Yeah, if you look in the comments we figured out it was the quizzes a few weeks ago. There isn't a way to author quizzes to ocw-studio and few months ago Ferdi and Peter said we do not plan to support quizzes in ocw courses. @Ferdi, let me know if that changed.

We can consider supporting a hide/show text shortcode in ocw-studio/ocw courses since that wouldn't be too hard to implement and can be used for other things in addition to quizzes.

dseaton commented 2 years ago

Comments: 1) we will not support creating new quizzes for the foreseeable future; 2) we do need a way to either fix or remove the broken quizzes, specifically, so the other content on a page is available.

@gumaerc I am having a hard time interpreting what is HTML in the example you sent. Are you talking about the list stuff like \<ol>, etc? Or am I missing the error.

gumaerc commented 2 years ago

@dseaton Basically it seems as if this course is unique in that the quizzes have HTML in their questions and answers. The ordered list is just one example. I have some code ready that will fix them, so I could put up a PR to fix it and we could re-import just this one course to fix the quizzes, or we could remove them. The HTML causes a problem because JSON strings need to be formatted with the keys and values both being surrounded in double quotes like:

valid JSON:
{
  "key": "value"
}

invalid JSON:
{
  "key": '<span class="big">value</span>'
}