Open demure opened 7 years ago
Hi - thanks for reporting this issue.
The problem comes from trying to come up with a simple but effective regular expression to 'find' urls within other strings.
At present the code uses: \b\S+://\S+\b
- the problem here is that the trailing slash is interpreted as a word boundary, so is removed.
I've tried other alternatives, such as: \w+:\/\/\S+
- this solves the trailing slash problem - however it instead sees all trailing punctuation as part of the url, so: click the link (http://example.com) please.
results in http://example.com)
The correct solution is probably to implement a match that matches the uri rfcs - however this requires dropping a very large regex into the code - of which there are many examples, each with varying levels of 'correctness' - see for example: https://mathiasbynens.be/demo/url-regex
In this case, I'd suggest that anyone who really wants to perfectly match uris should drop one of these patterns into their patterns
file and disable url matching in the config.
I'm of course very willing to accept a better regex that maintains as much simplicity as possible and removes this bug without introducing any other ones - suggestions welcome!
Would it be possible to use the deduplication feature to mitigate this?
When looking the code, I see:
# Check for growing or shrinking, but ignore duplicates
if last_item and text != last_item and (text in last_item or last_item in text):
Since the issue seems to be that first https://www.google.com
enters the clipboard and then https://www.google.com/
enters the clipboard after it, could we have a check for:
if last_item+'/' == text:
or some such?
Sorry if I am miss interpreting anything here, and thanks for your time.
EDIT:
# Extracted patterns are added to the history before the selection, and the clipbaord buffer is left unchanged.
Makes it sound like the https://www.google.com
is actually extracted... so I guess the above wouldn't be right then... though maybe the uri extraction could have a check for a one char change between original/extracted text?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This may be fixed by #101 since the pattern matching regexes have been updated. Any feedback on whether this fixes things welcome!
I'm not exactly sure how this happens, but I am able to recreate on my system with three different browsers, so I assume it is related to clipster. This has been happening for a while as well, I've just been too lazy to document it.
It looks like any time a top level page's URL is copied, you get two entries in clipster, the second entry (the newer of the two) differs by having a trailing '/'.
Example:
This does not appear to happen for some urls with content after the '/, depending on what I assume is a relationship to certain special characters like '.' and '-'.
Example of non double entry:
Thanks for your time. -demure