gutenbergtools / autocat3

CherryPy App that serves dynamic content for Project Gutenberg
GNU General Public License v3.0
4 stars 6 forks source link

Ordering of creators #124

Closed gbnewby closed 5 months ago

gbnewby commented 5 months ago

We have heard via submitters, and confirmed, that the order of authors and other creators is not preserved in the display of books at www.gutenberg.org

It seems that order is not part of the catalog database, for the various roles. Here are the roles:

gutenberg=> SELECT role, COUNT(DISTINCT author) AS unique_authors FROM v_books WHERE fk_books BETWEEN 60000 AND 69999 GROUP BY role;

I confirmed that the JSON file that is transmitted by dopush puts creators in the same order as submitted - i.e., the submission database for clearances retains order.

It seems the need is to update the catalog database to track the order of creators. Let's figure out how to approach this. I'm flagging this issue for autocat3 since that's where the landing page display happens. The input of the JSON to the catalog database is another component, and ebookmaker also consumes this to place the metadata in the headers of generated files.

eshellman commented 5 months ago

Please provide an example of the alleged problem.

Order is in fact preserved by a many-to-many table. Also remember that the cataloguer has the last word. it's possible that the cataloguer changed the order, perhaps by relinking an author in the case of duplicate author entry. Where the order has changed, the list may need to be reset by the cataloguer by emptying the list and then remaking it.

gbnewby commented 5 months ago

https://www.gutenberg.org/ebooks/73356

From the JSON pushed on April 7: "CONTRIBUTOR": [ { "name": "Bryce Walton", "role": "author" }, { "name": "Al Reynolds", "role": "author" },

This problem was reported to me before the cataloger updated the bibrec data, which seems to be firm evidence that once the JSON is processed on ibiblio, creator order is not preserved.

In this situation, subject cataloging has now been completed but the creators are still in the wrong order. We can check with the cataloging team in case they made an error in linking authors.

I will leave this as-is for now so you can see what's there. Once you have reported back, one thing I can do is confirm that relinking in a different order is a catalog-side fix.

The desire, though, is for the order to be preserved from the JSON. That might take a little more effort to set up an experiment to confirm whether the behavior you describe is happening at posting time.

On Sun, Apr 14, 2024 at 7:53 AM Eric Hellman @.***> wrote:

Please provide an example of the alleged problem.

Order is in fact preserved by a many-to-many table. Also remember that the cataloguer has the last word. it's possible that the cataloguer changed the order, perhaps by relinking an author in the case of duplicate author entry. Where the order has changed, the list may need to be reset by the cataloguer by emptying the list and then remaking it.

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/autocat3/issues/124#issuecomment-2054087364, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLSPPVDPFQXYNXVYUUDY5KKAHAVCNFSM6AAAAABGFS2LJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJUGA4DOMZWGQ . You are receiving this because you authored the thread.Message ID: @.***>

eshellman commented 5 months ago

I'll look at it in the morning

eshellman commented 5 months ago

It turns out that libgutenberg has always sorted authors alphabetically. The order is preserved in the database. I will open an issue in the libgutenberg repo. Here is what's in the db:

gutenberg=> SELECT oid, * FROM public.mn_books_authors
WHERE fk_books=73356;
    oid    | fk_books | fk_authors | fk_roles | heading 
-----------+----------+------------+----------+---------
 294603745 |    73356 |      33100 | aut      |       1
 294603747 |    73356 |      57042 | aut      |       2
(2 rows)

The heading column in the database denotes the "main" creator with heading=1; I've not yet figured out how the second author got a '2' for this one - that's what it should be! In any case, the heading value and the oid are ignored in our book-author lists. Note that autocat3 views render the data correctly, for example https://gutenberg.org/ebooks/author/57042 (that search is defined in autocat3, not libgutenberg). Note that library catalogs often list authors other than the "main" author alphabetically, that's probably the origin of the sorting done by libgutenberg.

eshellman commented 5 months ago

Closing in favor of https://github.com/gutenbergtools/libgutenberg/issues/40