michael-lazar / rtv

Browse Reddit from your terminal
MIT License
4.64k stars 274 forks source link

Can't enter markdown headings in new posts / comments #557

Closed rlue closed 6 years ago

rlue commented 6 years ago

Markdown headings are typically expressed in one of two ways:

setext-style h1
===============

setext-style h2
---------------

# atx-style h1

## atx-style h2

### atx-style h3

#### atx-style h4

but as noted in the new post / comment helptext,

Lines starting with '#' will be ignored

Thus, markdown headings below h2 are removed from the text before posts / comments are submitted.

There is no way to tell the difference between an atx-style h1 and a comment line, but for the rest, I think a good solution would be to allow lines which begin with multiple hashmarks, followed by any non-whitespace characters. (h1s can always be expressed in setext style anyways.)

Here is a regex for matching lines to be ignored (specifically, those which begin with only a single hashmark, or consist of only hashmarks and optional trailing whitespace):

/^#([^#]*|#+\s*)$/
michael-lazar commented 6 years ago

Thanks for pointing out this edge case :) I think you're proposal is a reasonable trade-off to solving the problem.

dredmorbius commented 6 years ago

Markdown has comment delimiters: <!-- [text] -->

Rather than stomp on valid syntax, insert a true comment block, with an indicator (string constant, UUID, timestamp, etc) that it's intentionally inserted and should be removed.

(Yes, I've just realised I've been badly affected by this,.)

rlue commented 6 years ago

Can't find it in the official spec, but I thought I'd read somewhere that Markdown comments lead with a triple-dash, not double (i.e., <!-- appears in the resulting HTML, but <!--- does not). According to a comment on this SO answer, though, that's a pandoc thing, not a markdown thing. I'm too lazy to confirm right now.

Not sure which markdown engine Reddit uses under the hood, but it may have additional syntax extensions for comments, as well. (It's definitely not Kramdown or GH-flavored markdown; might be redcarpet?)

dredmorbius commented 6 years ago

@rlue Reddit cites Gruber's Markdown syntax, and that falls back to raw HTML, which uses <!-- comment -->. In general, Reddit ignores HTML. Regardless,a standard, or specifically-keyed, comment block at a specific location, should work.

But not "#".

dredmorbius commented 6 years ago

And in matter of fact, Reddit flunks this test. Unescaped comments shouldn't render.

https://old.reddit.com/r/test/comments/8uglz3/comment_no_comment/

dredmorbius commented 6 years ago

I've reported the corresponding Reddit bug:

https://old.reddit.com/r/bugs/comments/8ugsq8/markdown_html_comment_delimiters_should_not/

rlue commented 6 years ago

Uh oh; looks like your bug report wasn't met with a very warm reception.

Another approach that wouldn't require waiting on a fix in reddit's markdown rendering is the accepted answer from the SO thread I linked above:

If you want a comment that is strictly for yourself (readers of the converted document should not be able to see it, even with "view source") you could (ab)use the link labels (for use with reference style links) that are available in the core Markdown specification:

...

[comment]: <> (This is a comment, it will not be included)
[comment]: <> (in  the output file unless you use it in)
[comment]: <> (a reference style link.)

Or you could go further:

[//]: <> (This is also a comment.)

To improve platform compatibility (and to save one keystroke) it is also possible to use # (which is a legitimate hyperlink target) instead of <>:

[//]: # (This may be the most platform independent comment)

@michael-lazar, what do you think?

michael-lazar commented 6 years ago

@rule it doesn't appear to be working

screen shot 2018-07-02 at 1 49 00 pm

rlue commented 6 years ago

Whoops, such comment blocks need to be bounded by blank lines. :\

(For maximum compatibility, that means leading AND trailing blank lines.)

Back to the drawing board?

BlitzKraft commented 6 years ago

May be just use backslash as an escape character, and send the following character as a literal instead of interpreting it?

dredmorbius commented 6 years ago

@rlue It'd be nice for Reddit to remove the ambiguity, though it's not essential. I'm happy to consider Reddit in error if rtv implements HTML-style comments.

@michael-lazar More generally, and correct me if I'm wrong, what's necessary here is for rtv to be able to unambiguously (or statistically-reasonably-ambiguously) create a comment block which is used for tracking post state, and remove that content before submitting the post or comment to Reddit.

You're presently tracking state within the content body itself. One option would be to preserve state elsewhere within the app. If that's possible, it bypasses the problem entirely.

Having the metadata in the body has actually proved useful to me as I've been archiving my Reddit content from /r/dredmorbius, with updates, various corrections, etc. I've developed a workflow for this, mostly involving opening in a pager (less), and piping content via a shell script to an archive file with a standard naming convention. Further filtering with fdupes and some plagiarism-detection / file-similarity tools to weed out the duplicate / multiple-version instances. I'm over half-way through that process.

And, with metadata-in-file, I actually know the corresponding posts, which is handy.

The problem is that for about a quarter of the archive I'd been writing the "corrected" files back to Reddit ... stripped of header divisions. I make heavy use of those....

Suggestions

  1. Move metadata to the app and exclude it from the file entirely. Possible, though not IMO ideal.

  2. Implement a \<!-- comment --\> style tag, and rely on that for being stripped out afterward. This risks removing any patterns matching <!-- .* --> within extant Reddit posts, though that should be a relatively low risk. At scale it's still likely unacceptable.

  3. Escape whatever comment-delimiting tokens you select on reading, remove on writing, from the post.

  4. Use HTML-style comments plus a long, randomly-generated identification string which is retained in-app state. Strip the HTML-style comment whose body contains that string, on re-ingestion and writing to Reddit.

Option 4 is probably the most robust, and would be what I suggest. 2 & 3 would likely work.

Thoughts?

michael-lazar commented 6 years ago

The comments aren't being used to track any state about the post, only to provide instructions to the user. I took the idea from git, for example when you make a new commit in git it opens a temporary file in your editor with the following text:

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
# Your branch is up to date with 'origin/master'.
#
# Changes to be committed:
#       new file:   tmp.txt

To be clear, this is what you're suggesting w/ number 4?

<!--f8das67dSAH&8h
Please type your submission below this comment block.

The first line will be interpreted as the title.
The following lines will be interpreted ad the content.

Posting to /r/WritingPrompts
-->

Overall I think this works out pretty well. It's still not as straightforward as using #, but it's probably worth the tradeoff for fully supporting markdown. There's a small chance that if the user isn't familiar with HTML they might screw it up by typing their post inside of the closing tag, or inadvertently deleting the closing tag. Since the random string is only needed to prevent collisions, it doesn't need to be unique and could be hardcoded in the RTV configuration or even set to something more legible like this:

<!--RTV+INSTRUCTIONS
Please type your submission below this comment block.

The first line will be interpreted as the title.
The following lines will be interpreted as the content.

Posting to /r/WritingPrompts
-->
dredmorbius commented 5 years ago

Just noting that the resolution is satisfactory. Thank you.