grzegorzmazur / yacas

Computer calculations made easy
http://www.yacas.org
GNU Lesser General Public License v2.1
124 stars 24 forks source link

OpenMath in Yacas #292

Closed MarcoCostantini closed 4 years ago

MarcoCostantini commented 4 years ago

Some issues about the OpenMath implementation in Yacas (already reported by private mail)

############ Ending space

The conversion from OpenMath to Yacas should not require an ending space or a newline at the end. Please consider this example:

OpenMathObjectWithEndingSpace := "<OMOBJ> <OMI>3</OMI> </OMOBJ> " FromString( % )OMRead() OpenMathObjectWithOutEndingSpace := "<OMOBJ> <OMI>3</OMI> </OMOBJ>" FromString( % )OMRead()

The two strings differs only for the space at the end. However, the first is correctly converted from OpenMath to Yacas, but the second is not, and a cryptic error message is returned.

In> OpenMathObjectWithEndingSpace := "<OMOBJ> <OMI>3</OMI> </OMOBJ> " Out> "<OMOBJ> <OMI>3</OMI> </OMOBJ> " In> FromString( % )OMRead() Out> 3 In> OpenMathObjectWithOutEndingSpace := "<OMOBJ> <OMI>3</OMI> </OMOBJ>" Out> "<OMOBJ> <OMI>3</OMI> </OMOBJ>" In> FromString( % )OMRead() Out> OMError({"moreerrors","unexpected"},"String(1) : Reaching end of file within a comment block ") In>

This problem probably is in the XML parser of Yacas. Consider that str:=ToString()OMForm(3) produces a string with an ending newline, so this problem doesn't happen with OpenMath objects generated by Yacas.

If this is difficult to fix, maybe the XML parser can just add an ending space before parsing.

############ Leading spaces

When converting strings from OpenMath to Yacas, the spaces should be kept. Please consider this example:

OpenMathStringOfSpaces := "<OMOBJ> <OMSTR> </OMSTR> </OMOBJ> " FromString( % )OMRead() OpenMathStringWithLeadingSpaces := "<OMOBJ> <OMSTR> a </OMSTR> </OMOBJ> " FromString( % )OMRead()

In the first case, the empty string is returned instead of the original one, and in the second case the leading spaces are lost.

In> OpenMathStringOfSpaces := "<OMOBJ> <OMSTR> </OMSTR> </OMOBJ> " Out> "<OMOBJ> <OMSTR> </OMSTR> </OMOBJ> " In> FromString( % )OMRead() Out> "" In> OpenMathStringWithLeadingSpaces := "<OMOBJ> <OMSTR> a </OMSTR> </OMOBJ> " Out> "<OMOBJ> <OMSTR> a </OMSTR> </OMOBJ> " In> FromString( % )OMRead() Out> "a " In>

About this, please consider what is said in http://www.catb.org/~esr/writings/taoup/html/ch01s06.html , which is especially relevant for OpenMath, which is intended to be used by computers:

For robustness, designing in tolerance for unusual or extremely bulky inputs is also important. Bearing in mind the Rule of Composition helps; input generated by other programs is notorious for stress-testing software (e.g., the original Unix C compiler reportedly needed small upgrades to cope well with Yacc output). The forms involved often seem useless to humans. For example, accepting empty lists/strings/etc., even in places where a human would seldom or never supply an empty string, avoids having to special-case such situations when generating the input mechanically. -- Henry Spencer One very important tactic for being robust under odd inputs is to avoid having special cases in your code. Bugs often lurk in the code for handling special cases, and in the interactions among parts of the code intended to handle different special cases.

############ Escaped chars

When converting strings to and from OpenMath, some characters should be escaped; in fact the OpenMath standard https://www.openmath.org/standard/om20-2019-07-01/omstd20.html#sec_xml-desc says: Note that as always in XML the characters < and & need to be represented by the entity references &lt; and &amp; respectively.

Please consider this example, in which the string "</OMSTR>" is converted from Yacas to OpenMath and vice versa:

str := OMForm( "</OMSTR>" ) FromString(str)OMRead()

In the OpenMath object there is a "<" unescaped, and the converter from OpenMath to Yacas gets confused:

In> str := OMForm( "</OMSTR>" ) <OMOBJ> <OMSTR></OMSTR></OMSTR> </OMOBJ> Out> True In> FromString(str)OMRead() CommandLine(1) : Invalid argument Out> False In>

In the following example, an incorrect input produces an error (up to this point is correct), however the string with the error message contains an unescaped </OMOBJ>, which confuses any xml parser that try to decode the openmath output from Yacas.

In> PrettyPrinter'Set("OMForm") <OMOBJ> <OMS cd="logic1" name="true"/> </OMOBJ> In> FromString("<OMOBJ><OMV name=\" \"\"/></OMOBJ> ")OMRead() <OMOBJ> <OME> <OMS cd="moreerrors" name="unexpected"/> <OMSTR>In function "XmlExplodeTag" : bad argument number 1 (counting from 1) The offending argument String(ReadToken()) evaluated to "</OMOBJ>" String(1) : Invalid argument </OMSTR> </OME> </OMOBJ> In>

When converting to OpenMath, at least the characters < and & must be escaped, and when converting from OpenMath, all the five escaped characters " ' < > & must be unescaped. This is an XML feature, not an OpenMath one, see https://www.novixys.com/blog/what-characters-need-to-be-escaped-in-xml-documents/

For instance, the string "</OMSTR>" must be converted to OpenMath either as:

<OMOBJ> <OMSTR>&lt;/OMSTR&gt;</OMSTR> </OMOBJ>

or as:

<OMOBJ> <OMSTR>&lt;/OMSTR></OMSTR> </OMOBJ>

When converting from OpenMath to Yacas, both must be converted back to "</OMSTR>".

grzegorzmazur commented 4 years ago

I've split the issue, moving leading spaces to #297 and escaped characters to #298 and leaving this one for the ending spaces issue only.