mitodl / ocw-to-hugo

A command line utility for taking master.json output from ocw-data-parser and producing markdown for use with hugo-course-publisher
3 stars 0 forks source link

unrendered markdown in tables #218

Closed pdpinch closed 3 years ago

pdpinch commented 3 years ago

Steps to Reproduce

Look at the 4th row in the table on https://ocwnext.odl.mit.edu/courses/21m-542-interdisciplinary-approaches-to-musical-time-january-iap-2010/sections/syllabus/

image

I'm not sure if it's related, but It appears to be a problem whenever the cell starts with two <br> tags.

Expected Behavior

markdown should render

Actual Behavior

### and _ are left visible.

gumaerc commented 3 years ago

@pdpinch I spent a bit of time looking into this today, and it seems there are a couple issues here. First of all, this is what the table looks like rendered as markdown:

| SES # | INSTRUCTORS | TOPICS | KEY DATES |
| --- | --- | --- | --- |
| {{< td-colspan 4 >}}**Week #1**{{< /td-colspan >}} ||||
| C1 | Martin Marks | Overview, getting acquainted, the syllabus, concerts, student projects | &nbsp; |
| C2 | Marcus Thompson | Class discussion | Student projects organized |
| R1 | &nbsp; | Open rehearsals for concert 1 | &nbsp; |
| E1 | &nbsp; | {{< br >}}{{< br >}}### Forum 1: Time as shape{{< br >}}{{< br >}}Michael Cuthbert (musicology, moderator){{< br >}}{{< br >}}Robert Jaffe (physics){{< br >}}{{< br >}}Libby Larson (composer){{< br >}}{{< br >}}Sara Brown (scenic design){{< br >}}{{< br >}}### Concert 1{{< br >}}{{< br >}}Andrew Imbrie, _serenade for flute, viola, and piano_{{< br >}}{{< br >}}Libby Larsen, _Black Birds, Red Hills_{{< br >}}{{< br >}}George Crumb, E_leven Echoes of Autumn_{{< br >}}{{< br >}}Maurice Ravel, _Piano Trio in A Minor_{{< br >}}{{< br >}} | &nbsp; |
| {{< td-colspan 4 >}}**Week #2**{{< /td-colspan >}} ||||
| C3 | Michael Cuthbert (Professor of Music) | Repeating time: minimalism and the structure of Reich's _Four Organs_ | Written reponses to forum & concert 1 due |
| C4 | Paul Schechter (Professor of Astrophysics) | A physicist's understanding of the concept of spacetime | Project proposals due |
| C5 | Michael Ouellette (Lecturer in Theater Arts) | On Michael Frayn's _Copenhagen_ | &nbsp; |
| C6 | Donald Sadoway (Professor of Materials Science and Engineering) | 'Everything I needed to know I learned in 3.091' (using art, literature, music, and film to teach chemistry) | &nbsp; |
| R2 | &nbsp; | Open rehearsals for concert 2 | &nbsp; |
| E2 | &nbsp; | {{< br >}}{{< br >}}### Forum 2: Time as Memory{{< br >}}{{< br >}}Bruce Brubaker (piano/contemporary music){{< br >}}{{< br >}}Peter Child (composition, moderator){{< br >}}{{< br >}}Deborah Stein (music theory){{< br >}}{{< br >}}### Concert 2{{< br >}}{{< br >}}Ludwig van Beethoven, _String Trio in E-flat Major_, Op. 5{{< br >}}{{< br >}}Peter Child, _Skyscraper Symphony_{{< br >}}{{< br >}}Antonín Dvořák, _String Quartet in E-flat Major_, _Op. 97, "The American"_{{< br >}}{{< br >}} | &nbsp; |
| {{< td-colspan 4 >}}**Week #3**{{< /td-colspan >}} ||||
| C7 | George Ruckert (Senior Lecturer in Music) | Measuring time: meters, cycles, and patterns in Hindustani music | Written reponses to forum & concert 2 due |
| C8 | Christopher Ariza (Visiting Professor of Music) | Events per unit of time: density as a compositional parameter in the music and synthesis techniques of Iannis Xenakis | &nbsp; |
| C9 | Stephen Tapscott (Professor of Literature) | {{< br >}}{{< br >}}Deeper into Muybridge: a poet's view{{< br >}}{{< br >}} | &nbsp; |
| R3 | &nbsp; | Open rehearsal for concert 3 | &nbsp; |
| E3 | &nbsp; | {{< br >}}{{< br >}}### Forum 3: Time as the Subject and Substance{{< br >}}{{< br >}}Ellen Harris (musicology, moderator){{< br >}}{{< br >}}Lewis Lockwood (musicology){{< br >}}{{< br >}}Paul Matisse (sculptor){{< br >}}{{< br >}}Stephen Tapscott (poet){{< br >}}{{< br >}}### Concert 3{{< br >}}{{< br >}}W. A. Mozart, _Oboe Quartet in F Major, K. 370_{{< br >}}{{< br >}}Charles Loeffler, _Two Rhapsodies for Oboe, Viola, and Piano_{{< br >}}{{< br >}}William Grant Still, _suite for violin and piano_{{< br >}}{{< br >}}Lukas Foss, T_ime Cycle_{{< br >}}{{< br >}} | &nbsp; |
| {{< td-colspan 4 >}}**Week #4**{{< /td-colspan >}} ||||
| C10 | {{< br >}}{{< br >}}Charles Shadle{{< br >}}{{< br >}}(Senior Lecturer in{{< br >}}{{< br >}}Music){{< br >}}{{< br >}} | Time and structure in a film score for D. W. Griffith's _Ramona_ (1910) | Written reponses to forum & concert 3 due |
| C11 | {{< br >}}{{< br >}}Mark Harvey{{< br >}}{{< br >}}(Lecturer in Music){{< br >}}{{< br >}} | {{< br >}}{{< br >}}In the moment: jazz time and improvisation{{< br >}}{{< br >}} | &nbsp; |
| C12 | &nbsp; | Student projects & performances | &nbsp; |
| C13 | &nbsp; | {{< br >}}{{< br >}}Student projects & performances{{< br >}}{{< br >}}Wrap-Up{{< br >}}{{< br >}} |

The first issue has to do with the heading tags. In Markdown tables, each row has to be on one line. The markdown heading syntax expects the pound signs to be at the beginning of the line, followed by a space and the text to go in the heading, followed by a line break. Since the pound signs aren't at the beginning of the line and there is no line break after, they are rendered as is. We could potentially solve this with a heading shortcode that takes an int parameter to define which size heading to use.

The second issue has to do with italic formatting. Some of these are rendering properly, and some aren't. If we look at the word "Copenhagen" in row C5, it is properly Italicized. to italicize a word or phrase, there needs to be an underscore on either side of the text to be formatted:

| C5 | Michael Ouellette (Lecturer in Theater Arts) | On Michael Frayn's _Copenhagen_ | &nbsp; |

Some of the italics that are not rendering properly have to do with a line break shortcode being butted up directly against one of the underscores. To fix that, I'll need to make a PR that will ensure that doesn't happen. There are some other examples though where poor HTML formatting has resulted in the Markdown being generated strangely, for example: Lukas Foss, T_ime Cycle_{{< br >}}{{< br >}}. If you look at the original OCW page, you'll see that this is the HTML that renders that part of the table:

image

So, in this case the original data should be edited to fix the mistake. A valid representation of italics in markdown should have the underscores on either side of the phrase, and the only characters allowed to be next to those underscores are spaces and other punctuation. Alpha numeric characters directly adjacent to the opening or closing underscores will cause it to break.