shaarli / Shaarli

The personal, minimalist, super-fast, database free, bookmarking service - community repo
https://shaarli.readthedocs.io/
Other
3.45k stars 296 forks source link

Tool to find existing duplicate #1303

Open jpyrat opened 5 years ago

jpyrat commented 5 years ago

Hi,

I have duplicates in my base (eg : https://veille.pyrat.net/?searchterm=moving+to+https&searchtags=wordpress)

Could be usefull to have a tool to find duplicate and delete them

nodiscc commented 5 years ago

@jpyrat Hi, normally when posting a link, if the URL is already in the database, the Edit Shaare dialog is shown instead of the Add Shaare dialog. Nothing fundamentally prevents 2 shaares with the same URL to be present in the datastore, but posting links from the web interface should prevent such duplication (except some edge cases, like 2 Add Shaare dialogs opened at the same time).

How did you post these links? API? Third-party tool?

Edit: If you try to submit https://movingtohttps.com/?platform=wordpress&hosting=apache&control=max again, the Edit Shaare dialog= should open and no other duplicate will be created. But which one of the duplicates is edited is inconsistent. I think the oldest modified one will open, not sure.

ArthurHoaro commented 5 years ago

@nodiscc Yes it'll be the oldest one, as links are ordered by dates descending, and the URL array is loaded in in order. However in this example all links have the same date.

This is an edge case which is not supposed to happen ; also your example is quite old. So, IMO, there shouldn't be a core feature trying to fix it. Either there is a bug somewhere that allowed this to happen (REST API maybe?) that needs to be fixed, or either it's just something that you could fix manually/using a custom script.

kalvn commented 5 years ago

Another idea would be to have a way to find links which are very likely the same but not exactly. For example a link that you previously saved in HTTP and that you save again in HTTPS. Or similar links but with close but different query string parameters. But I guess this is not easy to do.

Beeblebrox-BSD commented 2 years ago

This is an important feature that is unfortunately missing. The alternative is to export form Shaarli, import to browser (like Firefox), then use BM duplicate checker add-on, then export from FF & re-import to Shaarli.

That's ridiculous. This feature is very important on first import from multiple sources (different devices, different browsers, backups, etc) unless the devs don't think that Sharrli is for this purpose!!!

Also needed: Dead links checking tool Nice to have: Automatically save Dead Links to Wallabag from archive.org

EDIT: I'm assuming this is about duplicate bookmark URLs in Shaarli. I'll open separate thread if not. Also, HTTP vs HTTPS is a trivial problem, not even worth going into.

dsalo commented 6 months ago

This can also be an artifact of importing from another bookmarking tool into Shaarli. (Speaking from experience!) If it's not too much hassle to build a duplicate-detector, I'd definitely find utility in it.