superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.77k stars 323 forks source link

[feature] `text/html` post format option #2555

Open mirabilos opened 9 months ago

mirabilos commented 9 months ago

Is your feature request related to a problem ?

My RSS to Fediverse converters currently have to round-trip the HTML snippets from the feeds through Markdown for posting, only for GtS to convert them to HTML for federating again.

Describe the solution you'd like.

I’d like for GtS to additionally allow HTML posting.

At least for explicit post format selection, like in python3-mastodon’s Mastodon(…).status_post("…", content_type='text/html'), but (especially for bot users who would just post that, maybe not so much for human users due to lack of the usual newline = line/paragraph break convention) also as default account posting type.

Ideally, it would accept all three flavours of '
' (XHTML), <br/> (XHTML in XML mode) and <br> (HTML 4/5), but your HTML sanitiser for incoming statūs probably does that already. CDATA support would also be nice. The HTML inside RSS feeds is usually awful, so GtS’ sanitiser on it before federating it out is also a must.

This would also be nice to use with clients who can change the posting type per post (the already-mentioned python3-mastodon can do plaintext, markdown or html; some clients meant for humans can probably do that as well?).

Describe alternatives you've considered.

Stick to round-tripping through Markdown.

Additional context.

No response

tsmethurst commented 9 months ago

You can already include many html elements in markdown posts :)

For example, with your post type set to markdown, you can post the following:

Here's some HTML inside a markdown post:

<br />

^^ that was a break

Here's some <strong>strong</strong> text.

Here's an ordered list:

<ol>
  <li>item 1</li>
  <li>item 2</li>
</ol>

and it will parse as expected

mirabilos commented 9 months ago

Yes, but it’ll still be different, as Markdown interprets things differently, plus we have that single/double newline = line/paragraph break thing.

I’d like to have a pure HTML posting mode, like AFAIHH Pleroma/Akkoma have.

tsmethurst commented 9 months ago

I’d like to have a pure HTML posting mode

Have you tried posting a pure HTML document in markdown mode? Could you give an example of what you would like to post?

For example, if I post the following into the post compose box on pinafore, when my post type is set to markdown:

<p>If you wanna be less fancy you can also just put the following into your post compose box:</p>
<div>
    <pre class="language-markdown"><code class="language-markdown">&gt; \*deep drag from my comically long cigarette in its diamante holster\* actually sweatry it's not a GoToSocial bug
&gt;
&gt; -- tobi, [a post thereby](https://goblin.technology/@tobi/statuses/01HD47AEDY7QSHN5SDR4PJVYAV)
</code></pre>
<p>This comes out looking like this:</p>
<blockquote>
    <p>*deep drag from my comically long cigarette in its diamante holster* actually sweatry it's not a GoToSocial bug</p>
    <p>-- tobi, <a href="https://goblin.technology/@tobi/statuses/01HD47AEDY7QSHN5SDR4PJVYAV">a post thereby</a></p>
</blockquote>

It renders as:

<p>If you wanna be less fancy you can also just put the following into your post compose box:</p>
<div>
    <pre><code class="language-markdown">&gt; \*deep drag from my comically long cigarette in its diamante holster\* actually sweatry it&#39;s not a GoToSocial bug
&gt;
&gt; -- tobi, [a post thereby](https://goblin.technology/@tobi/statuses/01HD47AEDY7QSHN5SDR4PJVYAV)
</code></pre>
    <p>This comes out looking like this:</p>
    <blockquote>
        <p>*deep drag from my comically long cigarette in its diamante holster* actually sweatry it's not a GoToSocial bug</p>
        <p>-- tobi, <a href="https://goblin.technology/@tobi/statuses/01HD47AEDY7QSHN5SDR4PJVYAV" rel="nofollow noreferrer noopener" target="_blank">a post thereby</a></p>
    </blockquote>
</div>

So a few small changes as a result of formatting, but certainly if you want to just post HTML with divs and p tags and everything, then you can just go for it.

mirabilos commented 9 months ago

But having *foo* in HTML renders it not as *foo* but as foo in Markdown.

And the use case is to pump generic, horrid HTML into it, so…

tsmethurst commented 9 months ago

I entered in the compose box:

<p>*foo*</p>

It rendered to:

<p>*foo*</p>
mirabilos commented 9 months ago

Yes, but if you just enter *foo* it won’t. As I said, the HTML is horrible, often lacking any outer <p>s (which you also don’t need for short RSS posts, as each post is expected to be inside a div or something on the viewer side).

tsmethurst commented 9 months ago

each post is expected to be inside a div or something on the viewer side

Well, you could wrap it in a div when posting it then, perhaps. For example if you write:

<div><!-- Manually added div tag; everything inside here is just copy-pasted from some RSS post -->
Here's some *really shitty* html.
</div><!-- Manually added div tag close -->

Then it renders to:

<div>Here's some *really shitty* html</div>

Aside from that, even if we did have a text/html mode, it would just be a subset of the markdown functionality anyway since we'd be using the same parser. And in both cases, if you put garbage into it, you will get garbage out.

I'm not trying to be contrary or anything by the way, I'm just pointing out that you can already do this with no code changes required on our side.

mirabilos commented 9 months ago

Aside from that, even if we did have a text/html mode, it would just be a subset of the markdown functionality anyway since we'd be using the same parser. And in both cases, if you put garbage into it, you will get garbage out.

Huh? Why not just throw the HTML through the HTML sanitiser you use for incoming posts as well?

tsmethurst commented 9 months ago

We do... My point is that badly-formatted / barely-html-at-all stuff is going to look like crap whether you submit it as HTML or Markdown. Sanitizing can make unsafe HTML into safe HTML, but it can't make crap HTML into well-written HTML, is what I'm saying.

mirabilos commented 9 months ago

Yes, GIGO, sure.

I just did a field test of posting with <div></div> around, and (as I suspected) both the linebreaks and the HTML were interpreted.

Input:

<div>
This **is** a *test* for `Markdown` or    HTML
functionality

in this thing.

foo bar baz


and now with two spaces  
at the end of the line

this → @tobi@goblin.technology ← should <em>NOT</em> be a mention

----

- lets
- try
- dashes

1. and
2. numbers

### and headlines

<h4>HTML’s should work though</h4>

[this](https://github.com/superseriousbusiness/gotosocial/) should not be a link, <a href="https://github.com/superseriousbusiness/gotosocial/">this</a> should be one

~~~
md      has tildes
~~~

    and    spaces

and back\
slashes and \\ double so

the end
</div>

Rendering by webbrowser (which is as expected):

r-web

Rendering by GtS in Semaphore, which is very much not what I expected:

r-gts

mirabilos commented 9 months ago

(I’m willing to compromise on the mention, but I don’t know how Pleroma/Akkoma handle these, but I’d err towards just not supporting mentions for that kind of posts, at least not inline, or only with special markup like <a href="@user@host">@user@host</a> or something.)

The idea here is not to make crap HTML better, but to be able to post HTML without anything hidden in the HTML getting interpreted as Markdown because I don’t control the HTML. (Or I do, in the case of, say, Goodreads reviews, but I still don’t want anything in these that can (also) be interpreted as Markdown to actually be interpreted as Markdown.)

tsmethurst commented 9 months ago

Alright I take your point. In cases where the HTML is that crappy, it is indeed undesirable that the markdown special characters get interpreted. I'll bear this in mind as a possible feature.

mirabilos commented 9 months ago

Thanks, that’s all I ask. It’s a feature request, I can wait ☻