shaarli / Shaarli

The personal, minimalist, super-fast, database free, bookmarking service - community repo
https://shaarli.readthedocs.io/
Other
3.42k stars 291 forks source link

Can't import all links from my Delicious account #902

Closed jpyrat closed 6 years ago

jpyrat commented 7 years ago

I just wanted to switch from https://del.icio.us/jpyrat (RIP) to shaarli It blocks on certain contents Like :

<DT><A HREF="http://wiki.kde.org/tiki-index.php?page=UserPagetoggg" ADD_DATE="1182269176" PRIVATE="0" TAGS="toggg">KDE Wiki : UserPagetoggg</A>
<DD>-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GE/CS/IT/O d-(pu) s: a+ C++$>+++$ UL++++ UA* P+ L+++ E?
W++ N++>- !o K? w---@ O? M+>++ V+$ PS+(++) PE-- Y-(+)
PGP>+ t 5? X-- R* !tv b+++ DI? !D G e++++ h---->++ r++++ y+
------END GEEK CODE BLOCK------
<DT><A HREF="http://p7app.geneve.ch/spip/article.php3?id_article=340" ADD_DATE="1182264859" PRIVATE="0" TAGS="spip,reference,formations">Une sélection de tutoriels pour maîtriser SPIP</A>
<DD>En complément au présent site, voici une liste non exhaustive d’excellents documents disponibles sur le Web pour maîtriser les volets Rédacteurs, Administrateurs et Webdéveloppeurs SPIP.
virtualtam commented 7 years ago

Hi @jpyrat !

It blocks on certain contents

Does this issue only concern bookmarks containing code blocks / special chars? Or does it prevent you from importing other bookmarks?

jpyrat commented 7 years ago

I think that's the problem is with < and > not escaped One other problem is that Delicious Export doesn't close tags :

<DT><A HREF="http://www.phpindex.com/index.php/2005/03/22/67-jdnet-ixarm-la-place-de-marche-publique-n1-en-europe" ADD_DATE="1182276040" PRIVATE="0" TAGS="SPIP,pro2spip">JDNet : Ixarm : la place de marché publique N°1 en Europe - PHP Index - La passerelle française des technologies PHP: Hypertext Preprocessor</A>
<DD>"La démarche de dématérialisation des marchés publics du ministère de la Défense (Mindef) est sans conteste l'une des plus abouties de l'administration française".
... B asés sur SPIP, APACHE, MySQL, PHP, nouvelle preuve de la reconnaissance de PH
<DT><A HREF="http://wiki.kde.org/tiki-index.php?page=UserPagetoggg" ADD_DATE="1182269176" PRIVATE="0" TAGS="toggg">KDE Wiki : UserPagetoggg</A>
<DD>-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GE/CS/IT/O d-(pu) s: a+ C++$>+++$ UL++++ UA* P+ L+++ E?
W++ N++>- !o K? w---@ O? M+>++ V+$ PS+(++) PE-- Y-(+)
PGP>+ t 5? X-- R* !tv b+++ DI? !D G e++++ h---->++ r++++ y+
------END GEEK CODE BLOCK------
<DT><A HREF="http://p7app.geneve.ch/spip/article.php3?id_article=340" ADD_DATE="1182264859" PRIVATE="0" TAGS="spip,reference,formations">Une sélection de tutoriels pour maîtriser SPIP</A>
<DD>En complément au présent site, voici une liste non exhaustive d’excellents documents disponibles sur le Web pour maîtriser les volets Rédacteurs, Administrateurs et Webdéveloppeurs SPIP.

Does this help ?

virtualtam commented 7 years ago

Yup, it's very likely to come from < and > signs in the bookmark description being considered as closing tags by the NetccapeBookmarkParser utility.

Modifying the parser code to support such content is quite unlikely to be straightforward, hence my question:

Does this issue only concern bookmarks containing code blocks / special chars? Or does it prevent you from importing other bookmarks?

Unless you have a lot of links with similar code blocks, I'd recommend:

Anyway, I might have time next week to do some tests with https://github.com/shaarli/netscape-bookmark-parser , feel free to post other relevant exports :)

jpyrat commented 7 years ago

The delicious export contains 12000 links !

If needed, I can provide it in order to help shaarli debuging ;-)

virtualtam commented 7 years ago

O_o

12, 000 sure counts as a lot of links... and will make for a nice parsing exercise ;-)

jpyrat commented 7 years ago

And here is the nice parsing exercice ;-) delicious_export_20170614 (original).zip

jpyrat commented 6 years ago

The import worked for me with 0.9.6. Just one regression from delicious : tags with space are imported as separates tags :( (because shaarli doesn't handle tags with spaces)