Enhance the formatting of the output of Information

GarkGarcia commented 4 years ago

A picture is worth a thousand words:

Screenshot from 2020-09-21 21-35-20

The output is perfectly fine, but I think we should try formatting it in a more elegant way. For instance, there are no line breaks and there are no spaces after the end of the sentences. I'm not sure how this could be done, but I'd like to get this resolved before the 1.1 release (so I added it on the milestone).

@mmatera I can give it a try if you don't have for this. Could you provide us an example of how this is displayed in Wolfram Mathematica? Also, I'd like to thank you for the awesome work you did implementing Information. This is not a critique of your work, I'm just proposing an enhancement.

GarkGarcia commented 4 years ago

Maybe TextCell could help us. I'm not sure it's implemented though.

GarkGarcia commented 4 years ago

Maybe TextCell could help us. I'm not sure it's implemented though.

Apparently it's not implemented:

In[1]:= ? TextCell
Out[1]= Null

rocky commented 4 years ago

My take is not to spend too much time on this.

Documentation needs to be redone and rethought along the lines of existing Python tools like sphinx/restructuredText. There is currently to_str() and to_python(), and probably to_rst() could be added. Likewise for output forms there is // TexForm and maybe // RsTForm could be added.

All of this is after 1.1 where more major refactorings are done.

mmatera commented 4 years ago

This seems a problem with the mathml output. Maybe some extra parameters in the math tag should be introduced in order to limit ate the width of the output

rocky commented 4 years ago

In my opinion this is not a MathML problem, but an overall architectural problem. I come up against the same thing when I work in mathics the CLI.

Let me explain.

At the top level you call Evaluate() and pass an optional format to return. The existing types are:

"text"
"xml" (which is really MathML with SVG rendering)
"tex"
boxes to text

and there are two problems.

First is that this is too limited. What we want here is a more Python-rendering text-friendly way to do this, akin to RestructuredText, Markdown, and so on, and each of the above choices has problems. "TeX" isn't Python-friendly for formatting, and "text" is broad so you can't really tell if you want text with markup or the literal text. MathML wasn't intended for documentation, i.e. English sentences with section and example marks and embedded code blocks. Ideally there would be another kind of format like "rst" and then you'd call a Python library to do the formatting. It is clear why Mathematica doesn't have this. But why an implementation in Python doesn't have this is less clear.

Second and probably the bigger problem is that the decision for how to format really seems to change depending on what the result is.

Again, I see this in the mathics cli when I was looking at graphs. In the CLI it calls Evaluation without an explicit format type and the default is text. In the Django interface the default is xml. And here I am arguing that I would like rst if the expression starts "?" or "??".

In the CLI mathics, if I say Plot[x, {x, 0, 3}] I get back a the string -Graphics- and nothing more! But what I'd like in addition to that string is some Mathics object with some kind of indication as to what it is so I can use to call a mathplotlib show() if I have that installed, or pyplot, or sympy's cool text renderer maybe or something else. Currently I don't get that choice or the information to make that choice, and I should.

GarkGarcia commented 4 years ago

Again, I see this in the mathics cli when I was looking at graphs. In the CLI it calls Evaluation without an explicit format type and the default is text. In the Django interface the default is xml. And here I am arguing that I would like rst if the expression starts "?" or "??".

I'm not sure I agree. I don't think we should change the "display type" of expression on a per-expression level. That introduces unnecessary complexity and it can get messy pretty quickly.

I think a cleaner solution is to implement TextForm (which as essentially the AST of a primitive markup language, but it's represented as Mathics expressions) and return a TextForm when Information is called (actually we wouldn't return the TextForm, we would simply print it). By implementing both to_str and to_xml in TextForm we could control how the output of Information is displayed in the terminal and in the web interface.

I understand implementing this is far for trivial, so we may have to postpone this to 1.2 as well, but I think is a much cleaner solution than introducing RST or anything like that. Also, TextForm is in MMA's stdlib, we'll have to implement it at some point.

GarkGarcia commented 4 years ago

In the CLI mathics, if I say Plot[x, {x, 0, 3}] I get back a the string -Graphics- and nothing more! But what I'd like in addition to that string is some Mathics object with some kind of indication as to what it is so I can use to call a mathplotlib show() if I have that installed, or pyplot, or sympy's cool text renderer maybe or something else. Currently I don't get that choice or the information to make that choice, and I should.

That would be great (display plots in the terminal as well), but I don't see why this can't be fixed without introducing new "display types". Couldn't we simply implement to_str for plots using matplotlib as you saied?

GarkGarcia commented 4 years ago

MathML wasn't intended for documentation

This is absolutelly true. We should display things as plain HTML (using MathML in the parts that need need it).

mmatera commented 4 years ago

OK, several things:

Regarding @rocky complains about the architecture, I do not agree. Precisely, this "two-tiered" structure allows having a consistent very general way to represent objects as string-like expressions, and then, by defining Makeboxes in a proper way, represent these objects in many different ways, depending on the interface or the situation. The problem is that the mathics syntax support has been much more developed than the interface support and the tests we run just look at the Expression level.
What it is true is that for certain objects, the choice of the low-level representation maybe is non-optimal with the current technologies -but I think it were at the beginning of the project. The problem is that update the representation would take a lot of effort now...
About what @GarkGarcia said about to represent in MathML just those things that need it, and use plain HTML in general, would be a problem, because it would break the strategy of building nested boxes. For example, you could put a text inside a mathml object inside a graphic, inside a text, and this still would be represented more or less properly in the graphic interface. Maybe we can redesign the Makeboxes architecture to handle these cases in a better way, but again, is a non-trivial task.
In any case, I think that even keeping this architecture, we could fix several aspects. For example, we could simplify expressions that do not have MathML tags inside to plain HTML, or using DIVS to limit the size of the output in the graphics interface.

rocky commented 4 years ago

I probably am not understanding you all well. And we are probably coming at this from different angles.

What I know doesn't work right now is that if you run mathics or a command-line program and call evaluate(), and the result is a graph object, the front-end doesn't get back an anything that it can reasonably use to render this. It will be at best the string -Graphics-, and worst None. This is wrong.

I see that you can pass a dictionary of format types and that is respected.

In the short term in a CLI outside of this project, I am planning on patching in my object instance the format_output() method so that the CLI can call mathplot.show() or do ReStructuredText conversion. Or it could call the original format_output() method if that's what it want to do.

rocky commented 4 years ago

In the CLI mathics, if I say Plot[x, {x, 0, 3}] I get back a the string -Graphics- and nothing more! But what I'd like in addition to that string is some Mathics object with some kind of indication as to what it is so I can use to call a mathplotlib show() if I have that installed, or pyplot, or sympy's cool text renderer maybe or something else. Currently I don't get that choice or the information to make that choice, and I should.

That would be great (display plots in the terminal as well), but I don't see why this can't be fixed without introducing new "display types". Couldn't we simply implement to_str for plots using matplotlib as you saied?

matplotlib and pyplot is not a requirement for Mathics and shouldn't be fore the core part

For a separate command-line program like mathics the CLI split off from Mathics, it is a reasonable thing to require specific kinds for drawing packages.

The bigger picture is that a front end gives a string to Mathics to parse and evaluate. It should be able to get the mathics evaluation object back. The front-end is in a better position to know or understand how to present or format it, depending on the type of object and the front-end's capabilities.

GarkGarcia commented 4 years ago

About what @GarkGarcia said about to represent in MathML just those things that need it, and use plain HTML in general, would be a problem, because it would break the strategy of building nested boxes. For example, you could put a text inside a mathml object inside a graphic, inside a text, and this still would be represented more or less properly in the graphic interface. Maybe we can redesign the Makeboxes architecture to handle these cases in a better way, but again, is a non-trivial task.

Ohh, I see. Indeed, my idea would not work.

In any case, I think that even keeping this architecture, we could fix several aspects. For example, we could simplify expressions that do not have MathML tags inside to plain HTML, or using DIVS to limit the size of the output in the graphics interface.

Agreed. There's a lot we can improve without redesigning the whole system.

~What I know doesn't work right now is that if you run mathics or a command-line program and call evaluate(), and the result is a graph object, the front-end doesn't get back an anything that it can reasonably use to render this. It will be at best the string -Graphics-, and worst None. This is wrong.~

We aren't saying this shouldn't be worked on, we're just arguing that fixing this doesn't require redesigning the way we display objects in different interfaces.

GarkGarcia commented 4 years ago

matplotlib and pyplot is not a requirement for Mathics and shouldn't be fore the core part

I'd argue that since the plotting functionality is distributed with Mathics itself, it's reasonable to have a dependency on something like matplotlib (perhaps something more lightweight instead), but I understand your point.

The bigger picture is that a front end gives a string to Mathics to parse and evaluate. It should be able to get the mathics evaluation object back. The front-end is in a better position to know or understand how to present or format it, depending on the type of object and the front-end's capabilities.

This is a compelling argument. Perhaps you could provide some flags to Evaluate to indicate the capabilities of the front-end?

mmatera commented 4 years ago

About graphics, what we need to support is the infraestructure of Graphics and Graphics3D objects of WL. It does not means we need to provide a particular way to render them. We could think about having different mechanisms (asymptote / matplotlib / pillow / whatever) depending on the front end.

rocky commented 4 years ago

I'd argue that since the plotting functionality is distributed with Mathics itself, it's reasonable to have a dependency on something like matplotlib (perhaps something more lightweight instead), but I understand your point.

I am constantly learning the code and I've made a number of mistakes in my assessment both here and other places. (See for example some of the strikeouts). So with that in mind...

I think what's going on here is that the graphics is more tightly bound to SVG-kind of rendering.

The problem with making decisions in the Graphics3D class with the implicit understanding that SVG is to be used, as best as I understand (and I could be wrong about the code here), is that materializing decisions such as how many points to plot when not stated by the user explicitly, or what kind of shading to use, when not stated by the user explicity, makes things harder to undo when you find that the ultimate renderer has more features or works at a higher level. For example, maybe the plotter like pyplot has a better routine for figuring out automatically what the number of points and what they should be.

I think it is this aspect that is the reason why if you compare Mathics graphics output with one of the other Python packages, Mathics is lacking, even though underneath it may be using the same package!

In my experiements with graphs using networkx kinds of plotting, things were pretty simple and cool - the underlying representation that networkx provides is pretty high level, especially compared to what I am seeing with Graph3D. But I am not totally sure here.

About graphics, what we need to support is the infraestructure of Graphics and Graphics3D objects of WL. It does not means we need to provide a particular way to render them. We could think about having different mechanisms (asymptote / matplotlib / pillow / whatever) depending on the front end.

Right. I have been thinking about this, and here is the approach I suggest and am currently going with, using a CLI mathics and graphs as a pymathics module,

At first I'll just do that in the front end CLI part (not part of Mathics) since mathlotlib or pylot or sympy's plot aren't renderers in Mathics right now. After that is shaken out, it will be clear how to add that kind of rendering into Mathics. Rendering using a particluar kind of renderer could be a separate add-on to Mathics. If you are using say, my CLI you might be required to have lets say matplotlib around. However if you are using say the jupyter interface maybe not. It depends on what the front-end wants to support.

For this, all I need to do is get at the format_output() method of evaluation. Right now it feels a little hacky because what I am currently doing is:

from mathics.core.evaluation import Evaluation
def format_output(): 
   # ...
Evaluation.format_output = format_output

and it would be nicer to handle this in the API in a nicer way. A simple possibility is to add on the object creation __init__() an optional parameter to specify the format_output() routine. However it might be nice for a front-end to still have access to the one that is currently provided.

One thing I note that I think is cool about a front-end specified format routine is that it can be simpler.

Right now the front-end has to pass to evaluate() the kinds of things it supports. If it is a callback function, that is not needed because of course it knows what it provides.

But to back up a little, in general what we see and what we expect to see (unless Mathematica takes over the world and all other projects die) is that will invariably be new kinds of renders that pop up. Or new versions and releases of existing renderers.

So this gives the project a migration path: first handle the problem in a front end. After that works, consider whether to add it in a more general way into the Mathics as a renderer addon or update to an existing renderer add-on

Lastly, I'll say that the terminology for formats is screwy. Now that I understand boxes better, yes, that is the one that makes sense and probably might be the most pervasive. "xml" means right now, if I have this correct, basically "MathML". TeX while expedient, Mathematica-compatible and historical, from a practical standpoint is the most akward one to work with from a front end, so I don't really expect any front-end to start rendering TeX format as anything other than text.

And the problem with "text" is that unless it really is raw text, we lose information about the structure inside. This was the issue I have with using "text" for ?{,?} documentation kinds of output. While you could imagine things would be better if there were TextCell which allows mostly lower-level kinds of formatting mixed in with some higher level things like sectioning. it feels a little too homegrown, not as high-level or simple as working with higher level markup like ReStructuredText or Sphinx.

And the latter is more pervasively and better supported in Python. For TextCell someone is going to have to do a bit of programming.

mathics / Mathics

Enhance the formatting of the output of Information #920