Closed rlue closed 6 years ago
Thanks for pointing out this edge case :) I think you're proposal is a reasonable trade-off to solving the problem.
Markdown has comment delimiters: <!-- [text] -->
Rather than stomp on valid syntax, insert a true comment block, with an indicator (string constant, UUID, timestamp, etc) that it's intentionally inserted and should be removed.
(Yes, I've just realised I've been badly affected by this,.)
Can't find it in the official spec, but I thought I'd read somewhere that Markdown comments lead with a triple-dash, not double (i.e., <!--
appears in the resulting HTML, but <!---
does not). According to a comment on this SO answer, though, that's a pandoc thing, not a markdown thing. I'm too lazy to confirm right now.
Not sure which markdown engine Reddit uses under the hood, but it may have additional syntax extensions for comments, as well. (It's definitely not Kramdown or GH-flavored markdown; might be redcarpet?)
@rlue Reddit cites Gruber's Markdown syntax, and that falls back to raw HTML, which uses <!-- comment -->
. In general, Reddit ignores HTML. Regardless,a standard, or specifically-keyed, comment block at a specific location, should work.
But not "#".
And in matter of fact, Reddit flunks this test. Unescaped comments shouldn't render.
https://old.reddit.com/r/test/comments/8uglz3/comment_no_comment/
I've reported the corresponding Reddit bug:
https://old.reddit.com/r/bugs/comments/8ugsq8/markdown_html_comment_delimiters_should_not/
Uh oh; looks like your bug report wasn't met with a very warm reception.
Another approach that wouldn't require waiting on a fix in reddit's markdown rendering is the accepted answer from the SO thread I linked above:
If you want a comment that is strictly for yourself (readers of the converted document should not be able to see it, even with "view source") you could (ab)use the link labels (for use with reference style links) that are available in the core Markdown specification:
...
[comment]: <> (This is a comment, it will not be included) [comment]: <> (in the output file unless you use it in) [comment]: <> (a reference style link.)
Or you could go further:
[//]: <> (This is also a comment.)
To improve platform compatibility (and to save one keystroke) it is also possible to use # (which is a legitimate hyperlink target) instead of <>:
[//]: # (This may be the most platform independent comment)
@michael-lazar, what do you think?
@rule it doesn't appear to be working
Whoops, such comment blocks need to be bounded by blank lines. :\
(For maximum compatibility, that means leading AND trailing blank lines.)
Back to the drawing board?
May be just use backslash as an escape character, and send the following character as a literal instead of interpreting it?
@rlue It'd be nice for Reddit to remove the ambiguity, though it's not essential. I'm happy to consider Reddit in error if rtv implements HTML-style comments.
@michael-lazar More generally, and correct me if I'm wrong, what's necessary here is for rtv to be able to unambiguously (or statistically-reasonably-ambiguously) create a comment block which is used for tracking post state, and remove that content before submitting the post or comment to Reddit.
You're presently tracking state within the content body itself. One option would be to preserve state elsewhere within the app. If that's possible, it bypasses the problem entirely.
Having the metadata in the body has actually proved useful to me as I've been archiving my Reddit content from /r/dredmorbius, with updates, various corrections, etc. I've developed a workflow for this, mostly involving opening in a pager (less
), and piping content via a shell script to an archive file with a standard naming convention. Further filtering with fdupes
and some plagiarism-detection / file-similarity tools to weed out the duplicate / multiple-version instances. I'm over half-way through that process.
And, with metadata-in-file, I actually know the corresponding posts, which is handy.
The problem is that for about a quarter of the archive I'd been writing the "corrected" files back to Reddit ... stripped of header divisions. I make heavy use of those....
Move metadata to the app and exclude it from the file entirely. Possible, though not IMO ideal.
Implement a \<!-- comment --\>
style tag, and rely on that for being stripped out afterward. This risks removing any patterns matching <!-- .* -->
within extant Reddit posts, though that should be a relatively low risk. At scale it's still likely unacceptable.
Escape whatever comment-delimiting tokens you select on reading, remove on writing, from the post.
Use HTML-style comments plus a long, randomly-generated identification string which is retained in-app state. Strip the HTML-style comment whose body contains that string, on re-ingestion and writing to Reddit.
Option 4 is probably the most robust, and would be what I suggest. 2 & 3 would likely work.
Thoughts?
The comments aren't being used to track any state about the post, only to provide instructions to the user. I took the idea from git, for example when you make a new commit in git it opens a temporary file in your editor with the following text:
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
# Your branch is up to date with 'origin/master'.
#
# Changes to be committed:
# new file: tmp.txt
To be clear, this is what you're suggesting w/ number 4?
<!--f8das67dSAH&8h
Please type your submission below this comment block.
The first line will be interpreted as the title.
The following lines will be interpreted ad the content.
Posting to /r/WritingPrompts
-->
Overall I think this works out pretty well. It's still not as straightforward as using #
, but it's probably worth the tradeoff for fully supporting markdown. There's a small chance that if the user isn't familiar with HTML they might screw it up by typing their post inside of the closing tag, or inadvertently deleting the closing tag. Since the random string is only needed to prevent collisions, it doesn't need to be unique and could be hardcoded in the RTV configuration or even set to something more legible like this:
<!--RTV+INSTRUCTIONS
Please type your submission below this comment block.
The first line will be interpreted as the title.
The following lines will be interpreted as the content.
Posting to /r/WritingPrompts
-->
Just noting that the resolution is satisfactory. Thank you.
Markdown headings are typically expressed in one of two ways:
but as noted in the new post / comment helptext,
Thus, markdown headings below
h2
are removed from the text before posts / comments are submitted.There is no way to tell the difference between an atx-style
h1
and a comment line, but for the rest, I think a good solution would be to allow lines which begin with multiple hashmarks, followed by any non-whitespace characters. (h1
s can always be expressed in setext style anyways.)Here is a regex for matching lines to be ignored (specifically, those which begin with only a single hashmark, or consist of only hashmarks and optional trailing whitespace):