janeczku / calibre-web

:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database
GNU General Public License v3.0
12.53k stars 1.32k forks source link

To find all duplicate books to remove them / select and delete #2530

Open Ahormigo opened 2 years ago

Ahormigo commented 2 years ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] I'd like to find and identificate all duplicate books. Current, I cannot have it in Calibre web.

Describe the solution you'd like similar process to the addon of calibre: solution: a botton where I find duplicate books. select, and delete

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

lfmunoz commented 2 years ago

trying to use the plugin here

https://plugins.calibre-ebook.com/

No idea how to install it though, I guess the plugins only work on the desktop versions?

lfmunoz commented 2 years ago

Simple function to read the metadata.db and check for duplicates. Next have to call the API and delete book for each id.

import sqlite3
from contextlib import closing
import hashlib
import os

# ________________________________________________________________________________
# Global Constants 
# ________________________________________________________________________________
db_path = "metadata.db"
md5_path = "books_md5.txt"

# ________________________________________________________________________________
# DB helper functions
# ________________________________________________________________________________
def get_tables(cursor):
    """
('authors',)
('books',)
('sqlite_sequence',)
('books_authors_link',)
('books_languages_link',)
('books_publishers_link',)
('preferences',)
('publishers',)
('ratings',)
('series',)
('tags',)
('last_read_positions',)
('annotations',)
    """
    sql_query = """SELECT name FROM sqlite_master WHERE type='table';"""
    rows = cursor.execute(sql_query).fetchall()
    for r in rows:
        print(r)

def get_table_info(cursor, tableName):
    pragmas = cursor.execute(f"PRAGMA table_info({tableName});")
    columns = [n for _, n, *_ in pragmas.fetchall()]
    print(columns)

def get_books(cursor):
    """
    ['id', 'title', 'sort', 'timestamp', 'pubdate', 'series_index', 'author_sort', 'isbn', 'lccn', 'path', 'flags', 'uuid', 'has_cover', 'last_modified']
    """
    sql_query = """SELECT id, path FROM books;"""
    rows = cursor.execute(sql_query).fetchall()
    return rows

# ________________________________________________________________________________
# MD5 Helpers
# ________________________________________________________________________________
def md5_of_string(name):
    md5 = hashlib.md5(filename.encode('utf-8')).hexdigest()  # nosec
    return md5

def md5_of_file(filename):
    with open(filename, 'rb') as file_to_check:
        # read contents of the file
        data = file_to_check.read()    
        # pipe contents of the file through
        md5 = hashlib.md5(data).hexdigest()
    return md5

def read_md5_list():
    with open(md5_path, 'r') as f:
        lines = [line.strip() for line in f.readlines()]
    return lines

def write_md5_list(lines):
    with open(md5_path, 'w') as f:
        f.write('\n'.join(lines))

# ________________________________________________________________________________
# Main
# ________________________________________________________________________________
md5_current = []
id_delete_list = []

with closing(sqlite3.connect(db_path)) as connection:
   with closing(connection.cursor()) as cursor:
    books = get_books(cursor)
    for b in books:
        id = b[0]
        filename = b[1]
        md5 = md5_of_string(filename)
        if md5 in md5_current:
            print(f"[md5 found] - id={id} md5={md5}")
            id_delete_list.append(id)
        else:
            print(f"[md5 add] - id={id} md5={md5}")
            md5_current.append(md5)

    print()
    write_md5_list(md5_current)
    print(f" {len(id_delete_list)} duplicates found:")
    for id in id_delete_list:
        print(id)
Ahormigo commented 2 years ago

Hi Ifmunoz, Do you speak spanish? Would be better for me Thanks for your information. but I looking for have something integrated directly in the calibre-web [janeczku]. in the left menu. (enclosed picture) Captura de Pantalla 2022-09-13 a las 12 23 27

DarrenPIngram commented 1 year ago

If a "user friendly" solution could be made, rather than hoping to edit the database, that would be excellent. Then you can compare versions of duplicates and decide which is to be removed. How to program that, however, ...

OzzieIsaacs commented 1 year ago

You can use the book table view to search for books with same name e.g. and merge and delete them

DarrenPIngram commented 1 year ago

Unless I am missing the obvious, doesn't that mean I need to know that there is a duplication of "Dog Stories" and manually see it in the list.

Visually scanning the list is not great for a few reasons (number of books, visually handicapped).

I was sort of hoping for a "find duplicates" function and then I could manually process, a bit like within the desktop version of Calibre.

I cannot pretend to state how it is achieved programmatically by a volunteer though.

On Mon, 17 Oct 2022 at 10:26, Ozzie Isaacs @.***> wrote:

You can use the book table view to search for books with same name e.g. and merge and delete them

— Reply to this email directly, view it on GitHub https://github.com/janeczku/calibre-web/issues/2530#issuecomment-1280408774, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABV5MAGKPUYFCVXFQP2VY3TWDT5TZANCNFSM576KPG6Q . You are receiving this because you commented.Message ID: @.***>

Ahormigo commented 1 year ago

Exactly! I agree with you. It's strange, that it has not it.

Unless I am missing the obvious, doesn't that mean I need to know that there is a duplication of "Dog Stories" and manually see it in the list. Visually scanning the list is not great for a few reasons (number of books, visually handicapped). I was sort of hoping for a "find duplicates" function and then I could manually process, a bit like within the desktop version of Calibre. I cannot pretend to state how it is achieved programmatically by a volunteer though. On Mon, 17 Oct 2022 at 10:26, Ozzie Isaacs @.> wrote: You can use the book table view to search for books with same name e.g. and merge and delete them — Reply to this email directly, view it on GitHub <#2530 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABV5MAGKPUYFCVXFQP2VY3TWDT5TZANCNFSM576KPG6Q . You are receiving this because you commented.Message ID: @.>