shaarli / Shaarli

The personal, minimalist, super-fast, database free, bookmarking service - community repo
https://shaarli.readthedocs.io/
Other
3.4k stars 287 forks source link

Tools - Import links - Shaarli should filtered out meta characters #892

Closed ghost closed 7 years ago

ghost commented 7 years ago

Hi,

While importing bookmarks, Shaarli shouldn't create tags that start with meta characters.

For instance, I have the followings bookmarks on Safari :

safari_bookmarks

For the folder Github - Shaarli, Shaarli creates 3 tags :

tags_shaarli

The behavior seems to be the same regardless of the web browser used.

Here is the file used for the test (minus the .txt) safari.html.txt

virtualtam commented 7 years ago

Hi @Esak8 !

This behaviour is due to how bookmarks are parsed, stored internally and displayed:

  1. netscape-bookmark-parser
    • reads the headers correctly: Github - Shaarli
    • converts the string to lowercase: github - shaarli
  2. Shaarli
    • uses this value as-is when importing bookmarks
    • considers there are 3 space-separated tags: github, -, shaarli

We could add an option to the parser to allow either:

  1. stripping spaces from folder names:
    • github-shaarli
    • autredivers, doc
  2. stripping punctuation from tag names:
    • github, shaarli
    • autre, divers, doc
  3. stripping spaces and punctuation, joining items with - (dash):
    • github-shaarli
    • autre-divers, doc

IMO option 2) would make for a sensible default, with 1) and 3) available as alternatives