Detecting different paragraphs as the same

thoughtbot / carnival

An unobtrusive, developer-friendly way to add comments

MIT License

501 stars 30 forks source link

Detecting different paragraphs as the same #71

Closed calebhearth closed 6 years ago

calebhearth commented 9 years ago

On this post, there are two code "paragraphs" which are being detected as the same paragraph, so the comments are showing up in both places. http://robots.thoughtbot.com/external-posts-in-jekyll

pbrisbin commented 9 years ago

Unfortunately, the paragraphs are identified by content using an algorithm that makes sense for prose but probably not for code. That means this is likely to be a common issue for comment blocks attached to code snippets.

calebhearth commented 9 years ago

Seems like we may need a different or smarter algorithm? Code blocks are fairly easy to identify, so it seems possible to treat them differently.

Sent from my iPhone

On Jul 14, 2014, at 8:52 AM, pat brisbin notifications@github.com wrote:

Unfortunately, the paragraphs are identified by content using an algorithm that makes sense for prose but probably not for code. That means this is likely to be a common issue for comment blocks attached to code snippets.

— Reply to this email directly or view it on GitHub.

calebhearth commented 9 years ago

Additionally, the same paragraph would probably never appear twice in an article, and if it had I imagine that it would have a contextually different meaning and wouldn't make sense to have the same comments on.

Sent from my iPhone

On Jul 14, 2014, at 8:52 AM, pat brisbin notifications@github.com wrote:

Unfortunately, the paragraphs are identified by content using an algorithm that makes sense for prose but probably not for code. That means this is likely to be a common issue for comment blocks attached to code snippets.

— Reply to this email directly or view it on GitHub.

jferris commented 9 years ago

Idea: what if we added an index to the identifier for paragraphs with the same hash? Such as:

"Hello there" => "Ht1" "My friend" => "Mf1" "Hello there" => "Ht2" "Hi there" => "Ht3"

That will make it about as change-proof as it is now, but it will never put the same comment on two different paragraphs. It will also work well for code. Lastly, it will be easy to bring forward existing data (add a "1" to the end of every existing identifier), as opposed to a totally new hashing algorithm.

cpytel commented 9 years ago

I think adding a new paragraph is far too likely to introduce an index and not have the algorithm look for partial matches if the one we have is not found. We want to add that anyone, but it may be the impetus for the change.

That being said, I think it might be more fruitful to explore using a different algorithm for code blocks than for prose.

I’d like to see some more examples of this in the real world to get a better sense of how urgent it is.

mxie commented 9 years ago

I've encountered the same issue while reading through this post: http://robots.thoughtbot.com/middleman-bourbon-walkthrough. There are a bunch of blocks that have "5 comments", when it's really only applicable to the first occurrence (at source/stylesheets/all.css.scss header).

The only difference I see compared to Caleb's example is that it's only happening at each of the succeeding file name headers prior to each code block (instead of at the code blocks).