shevandrin / rqti

Create QTI Exercises and Exams from R
https://shevandrin.github.io/rqti/
GNU General Public License v3.0
4 stars 2 forks source link

problems with processing html table (low level functions) #177

Open johannes-titz opened 5 months ago

johannes-titz commented 5 months ago

I am working with the low level functions and experience the following problem:

tbl <- "<table style=\"border-collapse:collapse; border:none;\"><caption style=\"font-weight: bold; text-align:left;\">Residual Standard Error (Standardschätzfehler) = 2.983</caption><tr><th style=\"border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm;  text-align:left; \"> </th><th colspan=\"6\" style=\"border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; \">Dependent variable</th></tr><tr><td style=\" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal;  text-align:left; \">Predictor</td><td style=\" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal;  \">b</td><td style=\" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal;  \">SE (b)</td><td style=\" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal;  \">beta</td><td style=\" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal;  \">SE (beta)</td><td style=\" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal;  \">t-Wert</td><td style=\" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal;  col7\">p</td></tr><tr><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; \">(Intercept)</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  \">-1.299</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  \">0.896</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  \">0.000</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  \">0.073</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  \">-1.450</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  col7\">0.150</td></tr><tr><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; \">prestige</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  \">0.173</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  \">0.018</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  \">0.707</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  \">0.074</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  \">9.577</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:center;  col7\"><0.001</td></tr><tr><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm; border-top:1px solid;\">Observations</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;\" colspan=\"6\">94</td></tr><tr><td style=\" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;\">Deviance</td><td style=\" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;\" colspan=\"6\">818.900</td></tr></table>"
render_qtijs(new("Essay", content = list(tbl)))

gives: Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, : StartTag: invalid element name [68]

How can I debug this? Obviously, some tag is incorrect, but the message is not very informative. Can we output the actual tag? What is 68 referring to?

The table is automatically created via another package, it is already HTMLdecoded.

johannes-titz commented 5 months ago

Found the problem: <0.001 cannot be processed because it contains <

This is what you tried to tell me yesterday, I believe.

johannes-titz commented 5 months ago

Okay, I found a solution for my specific case; I just use HTMLdecode(hex=F, decimals=F), which in my case leaves < as is (&nbsp; is translated, but not stuff like &#45;). Not sure though how to handle this in general. But in any case it would be nice to get a more meaningful error message, indicating that <0.001 is the problem.

Almost looks like qti is fine with decimal codes (e.g. &#45;) but not with named codes (&nbsp;)?

shevandrin commented 5 months ago

yea, decimal works

johannes-titz commented 5 months ago

Okay, my suggestion: translate < and > manually to decimal, then call HTMLdecode. Are there any other characters we need to take care of?

shevandrin commented 5 months ago

image

shevandrin commented 5 months ago

i am working on solution

shevandrin commented 5 months ago

I finally did it, but I don't like what I got. I transfer content of XML into HTML (it encode entities). The problem is that html allows tags without closing , meanwhile xml not (must be ). I handled most common cases with and
, but it can lead to further mistakes, when a creator uses some functions to build html chunks (like you did with tables).

johannes-titz commented 4 months ago

I believe we settled on: the user is responsible for producing correct XML/XHTML. We might want to mention this in the docs though.