mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.73k stars 1.33k forks source link

Synchronize `__repr__()` and `_repr_html_()` #9830

Closed cbrnr closed 2 years ago

cbrnr commented 3 years ago

TIL that Info has a dedicated HTML representation, which is 🔥. However, I don't like that it doesn't show the same information as the standard text representation, for example: Screen Shot 2021-10-11 at 15 50 49 Specifically, I really miss the channel names and I don't like that MEG types are always listed, but ideally there should be no differences at all except for a nicer rendering in HTML. WDYT?

agramfort commented 3 years ago

personally what I don't like is that it hides the name the key to get the values in info

larsoner commented 3 years ago

No strong feeling from me. With the HTML we might be able to get away with more information even if we try to make it more consistent, for example we could probably do a 3-column table that has key / description / value (currently it's description / value), and keep the console more compact and closer to code with just key / value (what it is now).

agramfort commented 3 years ago

I am just +0.5 on add a key column to html repr

drammock commented 3 years ago

I propose:

  1. adding a key column as @larsoner suggests. Personally I'd but it in between the description and the value, possibly even as a parenthetical like "Highpass (highpass)" or "Sampling frequency (sfreq)"
  2. suppressing things like 0 magnetometer
  3. formatting the good channels to be \n separated instead of comma-separated (I think this was recently done by @larsoner for some other field in one of the other __html_repr__ methods? why not here too). If this is done, we should drop the words "and" and "channels" too
  4. not sure what to do about harmonizing which fields are actually shown; to me it makes sense that they would match as @cbrnr suggests, but maybe there's a good reason they don't?
agramfort commented 3 years ago

works for me

cbrnr commented 3 years ago

Ideally, we should only have to maintain __repr__ and just put its contents in a nice HTML table. I don't like that currently they contain different fields. I'm not even sure we need a description column for the fields when we could just use the keys (they should be self-explanatory and if not they are documented).

cbrnr commented 3 years ago

Maybe just something like this:

    def _repr_html_(self, caption=None):
        """Summarize info for HTML representation."""
        html = ''
        if isinstance(caption, str):
            html += f'<h4>{caption}</h4>\n'
        html += '<table class="table table-hover table-striped table-sm table-responsive small">\n'  # noqa
        for row in repr(self).split('\n')[1:-1]:
            key, value = row.split(':')
            html += '<tr>\n'
            html += f'<th>{key.strip()}</th>\n'
            html += f'<td>{value.strip()}</td>\n'
        html += '</table>'
        return html

Which looks like:

Screen Shot 2021-10-12 at 08 59 35
cbrnr commented 3 years ago

BTW, the same applies to Raw and Epochs. I think that the two reprs should contain the same pieces of information.

hoechenberger commented 3 years ago

@cbrnr In my current PR #9828 on adding ICA to Report, I also reworked ICA.__repr__ and added ICA._repr_html_. To reduce code duplication and improve consistency between the two, I took the following approach: https://github.com/mne-tools/mne-python/blob/558c88abc326db69a9832dcb9dfcb7735d7f06b5/mne/preprocessing/ica.py#L446-L540

I'm -1 on adding a key column as this will force us to do a 1-to-1 mapping between rows and keys or attributes, which I think is not practical:

Screen Shot 2021-10-13 at 00 13 05

Also I'm adding info that's just not present in the object itself (i.e., is calculated), namely Explained variance

hoechenberger commented 3 years ago

I want to add that my concern re 1-to-1 mapping can be disregarded if we're only talking about Info.

cbrnr commented 3 years ago

I find it very strange to show different pieces of information depending on which backend is used. What exactly is the reason why showing exactly the same information is not feasible? The HTML repr is also just text which should easily fit in a terminal. This should not just synced for Info, but also for Raw, Epochs, and ICA(I don't think we have more).

hoechenberger commented 3 years ago

What exactly is the reason why showing exactly the same information is not feasible?

HTML repr allows us to present data in true tables (that we will make collapsible soon, too), while you'll typically want a very concise repr on the terminal. Data can also be styled beyond just a tabular alignment in HTML, while this stuff isn't easy without introducing new dependencies on the terminal (e.g., via rich)

Also I think that when adding some of the HTML reprs, we didn't want to / dare to change the existing repr() output, but still wanted to take advantage of the features HTML offers.

hoechenberger commented 3 years ago

That said, +1 on ensuring they're more synchronous, but please don't make the HTML repr just an ugly HTML-ified copy of the text repr. It can do much better than that.

cbrnr commented 3 years ago

I'm still -1 on using different reprs. I can already imagine the confusion when people notice that the output (repr) of an object in the interactive interpreter is widely different from inside a notebook. If you want to add nice HTML summaries I'd really prefer if that had its own dedicated method (e.g. raw.print_html() or even raw.print()). Changing the default repr is asking for trouble.

hoechenberger commented 3 years ago

in that case I'd suggest to expand the text reprs such that they all produce tabular output like we do with the HTML repr (and kind-of with the Info text repr already)

hoechenberger commented 3 years ago

cc @vagechirkov who added many of the HTML reprs and has been working on #9220

cbrnr commented 3 years ago

TBH I don't think we should get a ton of output in a repr of a specific object. Ideally, typing raw in the interactive interpreter should return just one line with the most important information. Everything in addition should require to call a dedicated method.

cbrnr commented 3 years ago

If you really want to override a default method (and not add a new one) I'd strongly prefer __str__ over __repr__.

hoechenberger commented 3 years ago

If you really want to override a default method (and not add a new one) I'd strongly prefer __str__ over __repr__.

That's totally alright with me – honestly, I don't care. But I do care about nicely-formatted HTML output, I think it really improves the UX when working interactively. I was even thinking about adding figures to it if plotting can be fast enough (would be collapsed by default)

scikit-learn also has a rich HTML repr for its Pipeline, which can look like this:

Imagine we had something similar for our ICA …

cbrnr commented 3 years ago

This is really nice and useful as long as it's not the default repr. And by default I mean either __repr__ or _html_repr_, because for users the only thing that matters is what is displayed when they type an object in the interactive interpreter or Jupyter notebook. Edit: And by that I mean that both options should give the same result.

drammock commented 3 years ago

This is really nice and useful as long as it's not the default repr. And by default I mean either __repr__ or _html_repr_, because for users the only thing that matters is what is displayed when they type an object in the interactive interpreter or Jupyter notebook.

I don't understand the logic here. You say it's really nice and useful but that you don't want users to see it when they type the name of the object (which you say is "the only case that matters")?

I'm also a bit confused why you think we shouldn't "override" the repr of a class when the class is custom / specific to MNE-Python.

Is your position that (1) terminal and html should match and (2) terminal should be short, therefore (3) html cannot be fancy?

cbrnr commented 3 years ago

Is your position that (1) terminal and html should match and (2) terminal should be short, therefore (3) html cannot be fancy?

Yes, sorry for not being clear, that is my position. I do like nice HTML representations though, and I do want users to see it, only not when they type the name of the object. A repr should be short and self-contained representation of the object, and longer more verbatim information should be available when printing the object.

hoechenberger commented 3 years ago

A repr should be short and self-contained representation of the object, and longer more verbatim information should be available when printing the object.

I believe that most users would prefer to see a repr that HELPS them understand and explore their data, and it's only a minority that want super brief reprs like you :) and you would still be able to get it – simply type repr(foo)? Nobody loses anything, there's only to gain We could also produce BOTH in the HTML repr: a short and an "extended" version. And make both of them collapsible

drammock commented 3 years ago

We could also produce BOTH in the HTML repr: a short and an "extended" version. And make both of them collapsible

I was going to suggest something similar: a short terminal repr, and an HTML repr that is a superset of the terminal one: the short bit always displayed and the extra stuff collapsed by default. WDYT @cbrnr?

cbrnr commented 3 years ago

I was going to suggest something similar: a short terminal repr, and an HTML repr that is a superset of the terminal one: the short bit always displayed and the extra stuff collapsed by default. WDYT @cbrnr?

Maybe I'm not the right person to ask this, but I really don't like this inconsistent behavior. A notebook should behave like a normal interactive interpreter. Sure it has lots of additional features, but these should not silently override default methods. I'd be OK with e.g. print(raw) or raw._html_repr_() or raw.html() to show this additional HTML.

hoechenberger commented 3 years ago

I'd be OK with e.g. print(raw) or raw._html_repr_() or raw.html() to show this additional HTML.

Only that then we can also just leave it be as nobody will find out about it.

cbrnr commented 3 years ago

You never print an object?

larsoner commented 3 years ago

an HTML repr that is a superset of the terminal one

This is my thinking as well -- it would take care of @agramfort 's gripe about not having keys in _repr_html_ but keep the improved readability of the HTML one. To me HTML is more flexible, so having the information there be a superset of the non-terminal one is justified. So +1 for me on HTML as superset

We could also produce BOTH in the HTML repr: a short and an "extended" version. And make both of them collapsible

the short bit always displayed and the extra stuff collapsed by default.

I don't think this part needs to be codified necessarily, but rather made on a case-by-case basis. The descrption column in the description/key/value column triplet is really useful for info, so I'd prefer it be shown. For other classes (ICA?) on the other hand, this might be a nice way to hide more verbose stuff by default but allow showing it.

I really don't like this inconsistent behavior

Sounds like the vote is @cbrnr -1, @drammock +1 @hoechenberger +1 @larsoner +1 on the basic HTML-as-superset idea. @agramfort any opinion?

larsoner commented 3 years ago

Only that then we can also just leave it be as nobody will find out about it.

You never print an object?

Sounds like the vote ...

If there is more to consider or say, feel free to continue discussing -- I don't mean to shut down any discussion by asking for other folks to vote if not all major points have been made! To me the pros and cons are pretty clearly discussed or implied by previous comments above in terms of the tradeoffs in consistency, discoverability, usability, and effectiveness of HTML-matches-console versus HTML-superset-of-console, though, and it sounds folks weigh these differently in this case -- hence a vote-and-decide at this point seems reasonable to me here.

cbrnr commented 3 years ago

Since @hoechenberger mentioned scikit-learn pipelines, they have a config option where users can choose which repr is shown: https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_pipeline_display.html#sphx-glr-auto-examples-miscellaneous-plot-pipeline-display-py

I think this would be a very nice solution where all of us would be happy.

hoechenberger commented 3 years ago

I think this would be a very nice solution where all of us would be happy.

But what if that's not my goal?? 😅🙈 No but seriously, yes, could be a nice middle ground… But I already see us fighting about the "correct" default value there? 🤔

vagechirkov commented 3 years ago

I personally really like xarray HTML representations with collapsible options and find them very useful and transparent. From example: image

cbrnr commented 3 years ago

But what if that's not my goal?? 😅🙈

🤣 I knew it!

No but seriously, yes, could be a nice middle ground… But I already see us fighting about the "correct" default value there? 🤔

Nope, I'm totally fine with having HTML as the default as long as it is configurable (of course I'm also totally fine with text as the default).

mmagnuski commented 3 years ago

I'm also generally +1 for HTML repr being superset of the text one, but a configuration option would be nice. The only case I'm sometimes bothered by long reprs is when displaying lists of mne objects (list of info for example), but in such case the html repr is not used.

cbrnr commented 2 years ago

I would like to res this issue because I think the current dissociation between the two reprs are still very confusing (at least to me). Specifically, when writing some tutorial in a notebook-like environment, people should see the exact same output whether they are in a REPL or a notebook. In general, I prefer the REPL, but I write my tutorial with Quarto (which uses Jupyter notebook under the hood), so when I tell people to run e.g.

raw.info

I would like to see the REPL __repr__() and not _repr_html_().

Could we add a config option to disable all _repr_html_()? I think we should because AFAIK it is not possible to change this behavior on the IPython side (https://stackoverflow.com/questions/73540055/how-can-i-force-repr-instead-of-repr-html-in-jupyter-notebooks).

hoechenberger commented 2 years ago

@cbrnr +1 for adding a configuration option if Jupyter doesn't support this. The default should IMHO still be to use HTML repr in a Jupyter environment.

cbrnr commented 2 years ago

This would be great! And yes, HTML can remain the default for notebook envs.

Does anyone have an idea how to implement this technically? I briefly looked at how it's done in Scikit-learn, but couldn't really figure it out so I guess this will be a larger change.

I will also ask the IPython devs if they think this could be added on their end. Doesn't make a lot of sense that every package has to implement that.

hoechenberger commented 2 years ago

Does anyone have an idea how to implement this technically? I briefly looked at how it's done in Scikit-learn, but couldn't really figure it out so I guess this will be a larger change.

Could you not simply do something like this:

def _repr_html_(self):
    if config.no_html_repr:
        return repr(self)

?

cbrnr commented 2 years ago

Unfortunately, this doesn't work (nothing gets printed). It seems like it's not just the returned value that's needed by the notebook.

hoechenberger commented 2 years ago

This works for me in the VS Code Python Interactive Jupyter thingy:

# %%
class Test:
    def __repr__(self) -> str:
        return 'foobar'

    def _repr_html_(self) -> str:
        return repr(self)

test = Test()
test
Screen Shot 2022-09-13 at 13 43 41
hoechenberger commented 2 years ago

@cbrnr I assume you'll have to escape the < etc symbols

hoechenberger commented 2 years ago

@cbrnr I assume you'll have to escape the < etc symbols

Yep.

# %%
class Test:
    def __repr__(self) -> str:
        return '<foobar>'

    def _repr_html_(self) -> str:
        return repr(self)

test = Test()
test

prints nothing.

But this works:

# %%
import html

class Test:
    def __repr__(self) -> str:
        return '<foobar>'

    def _repr_html_(self) -> str:
        return html.escape(
            repr(self)
        )

test = Test()
test
Screen Shot 2022-09-13 at 13 46 27
cbrnr commented 2 years ago

Nice! So I guess we have a plan then! I will prepare a PR soon! Thanks a lot!

cbrnr commented 2 years ago

Line breaks \n should not be escaped though – is this possible?

cbrnr commented 2 years ago

The font is also not a fixed-width font, which ideally would be nice to have as well...

cbrnr commented 2 years ago

This should do it:

import html

class Test:
    def __repr__(self) -> str:
        return '<foo>\n<bar>'

    def _repr_html_(self) -> str:
        r = "<pre>" + html.escape(repr(self)) + "</pre>"
        return r.replace("\n", "<br/>")

test = Test()
test