nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
70.09k stars 7.66k forks source link

Single line breaks in paragraphs are removed from input/output by default #2728

Open Phil209 opened 2 months ago

Phil209 commented 2 months ago

Bug Report

When testing LLMs I give it a short poem and ask it to re-write it while maintaining its meaning. But even submitting the poem combines the lines into one like a paragraph, changing new lines into commas.

This also usually causes the LLM to do the same when writing poems (paragraphs with commas instead of stanzas).

Steps to Reproduce

Ask various LLMs to write a poem, or send it one and ask it to re-write it. For example, Gemma 2 9b.

For example..

Prompt: "Write a poem about the night." Response: "The sun descends, a fiery kiss, Upon the world, a gentle bliss. Then twilight falls, a soft embrace, As shadows dance and find their place. The moon ascends, a silver gleam, Casting its light, a silent dream. Stars ignite, a cosmic show, Across the velvet canvas, they glow. The night breathes deep, a tranquil sigh, As crickets chirp and owls fly by. A symphony of sounds so sweet, Lulling the world to slumber deep."

Should be...

The sun descends, a fiery kiss, Upon the world, a gentle bliss. Then twilight falls, a soft embrace, As shadows dance and find their place.

And so on

Expected Behavior

Output a standard poem (lines forming stanzas) rather than comma separated lines in paragraph form.

Your Environment

cosmic-snow commented 2 months ago

You may have to right click on the text, then 'Disable markdown'.

But maybe you can also tell it in the prompt to preserve the line endings with Markdown?

Phil209 commented 2 months ago

Thanks. That works. I'm closing this issue.

cebtenzzre commented 2 months ago

Hm, this seems like undesirable behavior to me. I mean, Discord's markdown processing doesn't kill newlines. Why should ours?

It's a feature typically more useful when you want to render a document written with line wrapping without the line wrapping, e.g. a GitHub README. That doesn't apply here.

cosmic-snow commented 2 months ago

As far as I know, when adhering to CommonMark, this is supposed to be done with either:

But yes, Discord and GitHub comments don't follow the spec in this regard and will do hard line breaks. I think neither is wrong. (Different use cases)

Point being, models may not be properly trained for this now, but could be at some point.

brankoradovanovic-mcom commented 2 months ago

This affects both input and output, and also affects code, as illustrated in #2614.

Two things bother me here:

  1. The issue can be worked around by clicking on "Disable markdown" - but there was no markdown to begin with, so this is not logical
  2. Even if this action was named differently, it's still a corrective action - but there is nothing wrong with the output (or input) text, so there is nothing to "correct"

The models might all output markdown some day, but when I C&P some code from my code editor into GPT4All chat, that is always going to be plain text, and I see no alternative to preserving line breaks, which is incidentally what virtually all other applications that deal with copy-pasted text do.

cosmic-snow commented 2 months ago

Just to clarify:

The models might all output markdown some day ...

A lot of them already do, or something at least somewhat markdown compatible, hence why it was made the default to display it in that format.

I am not trying to argue against changing the behaviour here. I just wanted to point to the relevant part of the spec in my previous comment. It might have implications for an implementation.

brankoradovanovic-mcom commented 3 weeks ago

Interestingly, there is now a similar issue with MS Outlook. Microsoft offers some workarounds. Note how - just like with GPT4All - all these workarounds require either the sender to change the message, or the recipient to perform an extra action.

It stands to reason that the sender wanted to have a line break, otherwise they would not have added it. Being forced to fix what was correct to begin with does not help.