google / mesop

Rapidly build AI apps in Python
https://google.github.io/mesop/
Apache License 2.0
5.65k stars 274 forks source link

:bar_chart: Preview `mermaid` charts :mermaid: #673

Open adriens opened 4 months ago

adriens commented 4 months ago

:grey_question: About

We use mesop as a front-end to our agent.... and we are bg fans of GitHub markdown flavour, for issues... and for documentation :

One of the most appreciated GitHub benefits is to live preview mermaid charts :

"Include diagrams in your Markdown files with Mermaid"

image

:point_right: Actually, we have to copy/paste the output markdown into a GH issue to get the full enduser experience.

:bulb: Feature request

The aim of this issue would to know if mesop could do the job for us, I mean to :

richard-to commented 4 months ago

I'm working on a markedjs web component here (still WIP -- need to add code highlighting): https://github.com/google/mesop/pull/667. This component renders the nested list without requiring four spaces.

Looks like there is a way to use mermaid with markedjs: https://mermaid.js.org/config/usage.html#example-of-a-marked-renderer

So you could copy that web component and modify it to include mermaid.

Then you can copy mel.chat (if you haven't already) and modify it to use the markedjs web component instead of me.markdown.

I haven't tested out the performance of the markedjs web component yet (and what else it's missing). But I think it's definitely possible to support mermaid (mermaid is great!).

Markedjs also has some in progress support for Github flavored markdown, but it does not look like it's ready yet: https://github.com/markedjs/marked/discussions/1202

richard-to commented 4 months ago

For the example markdown web component (https://github.com/google/mesop/pull/667), I added mermaidjs support.

Example:

Screenshot 2024-07-26 at 5 02 12 PM
adriens commented 4 months ago

Yup, the output and components are really good, but the problem is that I can't predict le LLM output :sweat_smile: ... so the pattern would to parse the LLM ouput and decide how to inject it into mesop ?

richard-to commented 4 months ago

I see, so the LLM output isn't just markdown? I was assuming the LLM just outputted markdown only (in that case you can just render it using the markdown web component). It's trickier if the format is not markdown. But if you could get your LLM to output markdown, then that would probably be easiest.

adriens commented 4 months ago

I see, so the LLM output isn't just markdown?

Hmmm, it is, but it embeds mermaidjs blocks as part of the markdown.

So you mean that we should maybe try so always send markdown back ?

richard-to commented 4 months ago

Yes, I've set up the web component to work like the github flavored markdown mermaidjs support essentially. So if the agent renders the mermaid js output like this in the output markdown:

Screenshot 2024-07-28 at 4 09 21 PM

Then it should render like this (but less fancy I think):

sequenceDiagram
    Alice ->> Bob: Hello Bob, how are you?
    Bob-->>John: How about you John?
    Bob--x Alice: I am good thanks!
    Bob-x John: I am good thanks!
    Note right of John: Bob thinks a long<br/>long time, so long<br/>that the text does<br/>not fit on a row.

    Bob-->Alice: Checking with John...
    Alice->John: Yes... John, how are you?
adriens commented 4 months ago

:ok_hand: @richard-to , we'll try so output in the target mesop markdown component. The trick is : on the mesop side : how should we or could we guess the component to use ?

... or maybe should be force the agent output to always answer in markdown... what's your opinion about that ?.. I mean... without having o deal with complex OutputParsers ?

richard-to commented 4 months ago

Yes, I think you should just always render the output in markdown if possible. That would likely be the easiest option. Granted I don't know what other use cases you have. For example if you have images being generated, you wouldn't be able to render the images with the markdown component (would need to use the image component)

But you could also ask the LLM to render output with some XML tags (using XML tags like this seems to be what Claude does a lot with their agents).

So you could have something like

<markdown>
Markdown text
</markdown>

or 

<plain_text>
Markdown text
</plain_text>

or 

<image>
Markdown text
</image>

Then you can parse XML blocks for the given types. Then you could use the appropriate mesop component based off of that.