lionleaf / dwitter

Social network for short js demos
https://www.dwitter.net
Apache License 2.0
763 stars 69 forks source link

fix: treat hashtags with a preceding : as anchors (fixes #474) #504

Closed jellyedwards closed 3 years ago

jellyedwards commented 3 years ago

This should fix #474, it was a little tricky as hashtags#likethis and /u/even#likethis are allowed so to explain the regex I came up with:

(?:^|\s)    # go to the start of the line or the next space
(?=\S*#)    # lookahead to see is there a '#' after a bunch of non-whitespace characters?
(?!\S*:)    # lookahead to check aren't any ':' characters there (otherwise it's likely an anchor)
\S*         # skip any of the characters leading up to the '#'
# then do the hashtag grouping that was there before:
#(?P<hashtag>[_a-zA-Z][_a-zA-Z\d]*)

I've created a little test for it which passes along with the rest.

First time running a Django project & first time contributing so if I've messed up anything let me know!

atesgoral commented 3 years ago

Instead of making this URL-specific, could we instead say the # needs to be preceded by a word boundary (\b)?

jellyedwards commented 3 years ago

Instead of making this URL-specific, could we instead say the # needs to be preceded by a word boundary (\b)?

That was my first attempt but some tests failed because examplesLike#this are allowed.

atesgoral commented 3 years ago

examplesLike#this are allowed.

Maybe this fix can be done as a breaking change. There isn't a reason (other than backwards compatibility) to support patterns like that, but the URL case is very common. So maybe the hashtag rule can be enforced as a "standalone #hash"?

jellyedwards commented 3 years ago

I agree, it would be more consistent with other hashtag implementations, I just didn't want to break anything on my first contribution 😄

lionleaf commented 3 years ago

I'm happy with a breaking change on hashtag behavior to align with more common practices. I still think this is a good change to get in, it looks good to me.

Would you be OK with a rollout of this fix that only impacts new hashtags added and rendering of old hashtags? By that, I mean I'll leave the old links in the database. For example: A comment with "http://example.com/site#anchor" would link that dweet with the #anchor hashtag. If I push this to the server with regenerating all the hashtag, said dweet will still be linked to the #anchor hashtag, show up if you go to h/anchor, but the #anchor link will be ignored as intended.

Let me know whether you'd want to move forward with this change as is, or attempt to simplify it and make hashtag parsing stricter. (I could do some searches to see how many comments will be impacted by such a breaking change)

PS: Am I correct that your fix will still ignore example.com/something#anchor, i.e. without "https://"? PPS: Apologies for being slow on this one, and thank you so much for the contribution!

jellyedwards commented 3 years ago

Actually I don't think you need to make any DB changes, it should work with past and future comments.

Here's what it looks like with the original code: image

.. and here's how it looks using this branch (same comment with no changing of DB entries): image

... so it should fix the existing ones!

jellyedwards commented 3 years ago

and yes it would ignore example.com/something#anchor

lionleaf commented 3 years ago

Here's what it looks like with the original code: image

.. and here's how it looks using this branch (same comment with no changing of DB entries): image

My point was simply that if you go to dwitter.net/h/Rela[...] that dweet will still be associated with that hashtag in the database. But I don't think that matters much.

This looks good. Thank you :)