mkucej / i-librarian-free

I, Librarian - open-source version of a PDF managing SaaS.
https://i-librarian.net
GNU General Public License v3.0
237 stars 28 forks source link

Error when importing bibtex file with authors with same initials #87

Closed cristobaltapia closed 1 year ago

cristobaltapia commented 1 year ago

Hi, this is a rather edge case, but I thought that it will be good to write it here. I was importing a large bibtex file with the import wizard and was getting the following error:

500 Internal Server Error
Database error: SQLSTATE[23000]: Integrity constraint violation: 19 UNIQUE constraint failed: items_authors.item_id, 
items_authors.author_id #0 /app/classes/storage/database.php(237): PDOStatement->execute() 
#1 /app/app/models/item.php(526): Librarian\Storage\Database->run('INSERT INTO ite...', Array) 
#2 /app/app/models/item.php(1461): LibrarianApp\ItemModel->_save(Array) 
#3 [internal function]: LibrarianApp\ItemModel->_importText(Array)
#4 /app/classes/mvc/model.php(126): call_user_func_array(Array, Array) 
#5 /app/app/controllers/import.php(365): Librarian\Mvc\Model->__call('importText', Array) 
#6 /app/classes/router.php(114): LibrarianApp\ImportController->textAction() 
#7 /app/classes/application.php(103): Librarian\Router->dispatch() 
#8 /app/public/index.php(156): Librarian\Application->handle() 
#9 {main}.

After some debugging I realized that the error was due to one entry containing two authors with the same initials and the same last name. Something like this:

@article{test,
 author = {A. Test and A. Test and B. Other},
 title = {test title},
 year = {2023}
}

Now, obviously such cases should be handled by the user, who should provide the full names (not only initials) in order to disambiguate. However, in the case of importing a large collection it would be useful to handle this in some way, so that no error is raised and the import process can be completed. The user could disambiguate this entry later.

What do you think?

mkucej commented 1 year ago

We have to do something about this, 300 million people in China have just a few last names. This case might be more common than we thought.

mkucej commented 1 year ago

Change:

/* Items authors many-to-many table. */
CREATE TABLE IF NOT EXISTS items_authors (
    item_id INTEGER NOT NULL,
    author_id INTEGER NOT NULL,
    position INTEGER NOT NULL, -- a position of the author in an item's list of authors
    PRIMARY KEY (item_id, author_id),
    FOREIGN KEY (item_id) REFERENCES items(id) ON DELETE CASCADE,
    FOREIGN KEY (author_id) REFERENCES authors(id) ON DELETE RESTRICT
);

To:

/* Items authors many-to-many table. */
CREATE TABLE IF NOT EXISTS items_authors (
    item_id INTEGER NOT NULL,
    author_id INTEGER NOT NULL,
    position INTEGER NOT NULL, -- a position of the author in an item's list of authors
    PRIMARY KEY (item_id, author_id, position),
    FOREIGN KEY (item_id) REFERENCES items(id) ON DELETE CASCADE,
    FOREIGN KEY (author_id) REFERENCES authors(id) ON DELETE RESTRICT
);
cristobaltapia commented 1 year ago

Hi @mkucej, I think that this is a good solution. It should at least prevent the error while importing and the author name can be disambiguated later (if possible/required). And I also learned that you can use lists as primary keys in SQL... pretty cool :P