internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.1k stars 1.33k forks source link

Cleanup of /config/edition #725

Closed LeadSongDog closed 6 years ago

LeadSongDog commented 6 years ago

As suggested at #142 by @tfmorris I have run through the list of linked edition identifiers from https://openlibrary.org/config/edition in order to find those that need revision. Many of them turn out to be overdue for change. Below is my best cut at cleaning up the identifiers section, but it will need someone with better understanding and admin rights to change the files.

If I understand correctly the source is at: https://github.com/internetarchive/openlibrary/openlibrary/plugins/openlibrary/pages/config_edition.page

My amended list is:

identifiers:

url: https://alkindi.ideo-cairo.org/controller.php?action=SearchNotice&noticeId=@@@ website: https://www.ideo-cairo.org/ notes: name: dominican_institute_for_oriental_studies_library label: Al Kindi

url: https://www.alibris.com/booksearch?qwork=@@@ notes: name: alibris_id label: Alibris ID

url: https://www.amazon.com/gp/product/@@@ name: amazon label: Amazon.com

url: https://www.amazon.ca/gp/product/@@@ notes: name: amazon.ca_asin label: Amazon.ca

url: https://www.amazon.de/gp/product/@@@ notes: name: amazon.de_asin label: Amazon.de

url: https://www.amazon.it/gp/product/@@@ notes: name: amazon.it_asin label: Amazon.it

url: https://www.amazon.co.uk/gp/product/@@@ notes: name: amazon.co.uk_asin label: Amazon UK

website: http://www.guidedogswa.org/library/openbiblio/shared/biblio_view.php?bibid=@@@&tab=opac notes: Still http name: abwa_bibliographic_number label: Association for the Blind of Western Australia

website: http://www.bne.es/en/Catalogos/index.html notes: name: depósito_legal label: Biblioteca Nacional de España Depósito Legal

website: http://catalogue.bnf.fr/ notes: Still http name: bibliothèque_nationale_de_france label: Bibliothèque Nationale de France

url: https://bibsys-almaprimo.hosted.exlibrisgroup.com/primo_library/libweb/action/dlDisplay.do?vid=BIBSYS&docId=BIBSYS_ILS@@@ website: https://bibsys-almaprimo.hosted.exlibrisgroup.com/ notes: name: bibsys label: Bibsys ID

url: https://www.biodiversitylibrary.org/bibliography/@@@ website: https://www.biodiversitylibrary.org notes: name: bhl label: Biodiversity Heritage Library

url: http://solo.bodleian.ox.ac.uk/OXVU1:LSCOP_OX:oxfaleph@@@ website: https://www.bodleian.ox.ac.uk/ notes: name: bodleian,_oxford_university label: Oxford University Bodleian Library Aleph System Number

url: https://www.bookcrossing.com/journal/@@@ website: https://www.bookcrossing.com notes: name: bcid label: Book Crossing ID (BCID)

url: http://booklocker.com/books/@@@.html website: http://booklocker.com/ notes: Still http name: booklocker.com label: BookLocker.com

url: http://www.bookmooch.com/detail/@@@ notes: Still http name: bookmooch label: Book Mooch

website: http://www.bookwire.com/ notes: name: bookwire label: Bowker BookWire

url: http://www.booksforyou.co.in/@@@ website: http://www.booksforyou.co.in notes: Still http name: booksforyou label: Books For You

url: https://bostonpl.bibliocommons.com/item/show/@@@ website: https://bostonpl.bibliocommons.com notes: name: boston_public_library label: Boston Public Library

website: https://www.bl.uk/ notes: name: british_library label: British Library

website: https://ecommons.cornell.edu/handle/1813/11665 notes: name: cornell_university_ecommons label: Cornell University ecommons

website: https://newcatalog.library.cornell.edu/catalog/@@@ notes: name: cornell_university_library label: Cornell University Library

website: notes: Session-based IDs name: canadian_national_library_archive label: Canadian National Library Archive

url: http://d-nb.info/@@@ website: http://www.d-nb.de/eng/index.htm notes: Still http name: dnb label: Deutsche National Bibliothek

url: http://zbc.ksiaznica.szczecin.pl/dlibra/docmetadata?id=@@@ website: http://zbc.ksiaznica.szczecin.pl notes: name: digital_library_pomerania label: Digital Library of Pomerania

url: http://www.discovereads.com/books/@@@ website: http://www.discovereads.com notes: name: discovereads label: Discovereads

url: http://www.freebase.com/view/en/@@@ website: http://freebase.com/ notes: Defunct name: freebase label: Freebase

url: https://www.goodreads.com/book/show/@@@ name: goodreads label: Goodreads

url: https://books.google.com/books?id=@@@ name: google label: Google

url: https://catalog.hathitrust.org/Record/@@@ website: https://hathitrust.org/ name: hathi_trust label: Hathi Trust

url: https://hollis.harvard.edu/primo_library/libweb/action/display.do?doc=HVD_ALEPH@@@ website: https://library.harvard.edu name: harvard label: Harvard University Library

url: https://ilmiolibro.kataweb.it/schedalibro.asp?id=@@@ website: https://ilmiolibro.kataweb.it notes: name: ilmiolibro label: Ilmiolibro

url: https://archive.org/details/@@@ name: ocaid label: Internet Archive

url: http://www.isfdb.org/cgi-bin/pl.cgi?@@@ website: http://www.isfdb.org notes: Still http name: isfdb label: Internet Speculative Fiction Database

url: http://estc.bl.uk/@@@ name: etsc label: English Title Short Catalogue Citation Number

name: isbn_10 label: ISBN 10

name: isbn_13 label: ISBN 13

website: http://www.issn.org/ notes: Still http name: issn label: ISSN

url: https://lccn.loc.gov/@@@ name: lccn label: LC Control Number

url: https://www.librarything.com/work/@@@ name: librarything label: Library Thing

url: https://www.lulu.com/shop/product-@@@.html website: https://www.lulu.com notes: Self-publishing platform name: lulu label: Lulu

url: http://www.magcloud.com/browse/Issue/@@@ website: http://www.magcloud.com notes: Self-publishing platform name: magcloud label: Magcloud

url: https://id.ndl.go.jp/bib/@@@ website: https://ndlonline.ndl.go.jp notes: name: NDL Bibliographic ID label: National Diet Library, Japan

url: https://catalogue.nla.gov.au/Record/@@@ website: https://www.nla.gov.au/ name: nla label: National Library of Australia

url: https://libris.kb.se/bib/@@@ website: https://libris.kb.se notes: name: libris label: National Library of Sweden (Libris)

url: https://www.worldcat.org/oclc/@@@?tab=details website: https://www.worldcat.org name: oclc_numbers label: OCLC/WorldCat

url: https://www.overdrive.com/media/@@@ website: https://www.overdrive.com name: overdrive label: OverDrive

url: http://www.paperbackswap.com/book/details/@@@ website: http://www.paperbackswap.com notes: Still http name: paperback_swap label: Paperback Swap

url: https://www.gutenberg.org/etext/@@@ website: https://www.gutenberg.org name: project_gutenberg label: Project Gutenberg

url: https://www.scribd.com/doc/@@@/ website: https://www.scribd.com name: scribd label: Scribd

url: http://www.shelfari.com/books/@@@/ website: http://www.shelfari.com/ notes: Merged to goodreads.com name: shelfari label: Shelfari

url: https://www.smashwords.com/books/view/@@@ website: https://www.smashwords.com notes: Commission self-publishing platform name: smashwords_book_download label: Smashwords Book Download

url: https://catalogue.libraries.london.ac.uk/record=@@@ website: https://catalogue.libraries.london.ac.uk/ notes: name: ulrls label: University of London

url: http://books.wwnorton.com/books/detail.aspx?id=@@@ website: http://wwnorton.com notes: Still http; name: w._w._norton label: W. W. Norton

url: http://zdb-katalog.de/title.xhtml?ZDB-ID=@@@ website: http://zdb-katalog.de?lang=EN notes: The Zeitschriftendatenbank is the world’s largest specialized database for serial titles (journals, annuals, newspapers etc., incl. e-journals). name: zdb-id label: ZDB-ID

url: https://kansalliskirjasto.finna.fi/Record/vaari.@@@ website: https://www.kansalliskirjasto.fi/ notes: The National Repository Library of Finland name: vaari label: Vaari

url: https://fennica.linneanet.fi/vwebv/holdingsInfo?bibId=@@@ website: https://www.kansalliskirjasto.fi/ notes: The National Bibliography of Finland name: fennica label: Fennica

url: https://opacplus.bsb-muenchen.de/metaopac/search?id=7670775 website: https://www.bsb-muenchen.de notes: name: bayerische_staatsbibliothek label: Bayerische Staatsbibliothek BSB-ID

url: https://www.abebooks.de/servlet/BookDetailsPL?bi=@@@ website: https://www.abebooks.de notes: Bookseller network identifier for a specific copy name: abebooks.de label: Abebooks.de

url: http://catalogo.bne.es/uhtbin/cgisirsi/x/0/0/57/5/3?searchdata1=@@@{CKEY}&user_id=WEBSERVER website: http://www.bne.es/en/Inicio/index.html notes: Still http name: depósito_legal label: Depósito Legal. Biblioteca Nacional de España

url: http://search.bl.uk/primo_library/libweb/action/display.do?doc=BLL01@@@ website: http://www.bl.uk/bibliographic/natbib.html notes: Still http name: british_national_bibliography label: British National Bibliography system number

url: http://catalogue.bnf.fr/rechercher.do?motRecherche=@@@ website: http://www.bnf.fr notes: still http, query for (e.g.) 39160616 returns MARC field 001 value FRBNF391606160000007 name: bibliothèque_nationale_defrance(bnf) label: Bibliothèque nationale de France (BnF)

tfmorris commented 6 years ago

Thanks for all the testing and updates, but I'm having a hard time figuring out what to change based on the current contents. A diff/PR would be best, but failing that a list of editing instructions would be easier (e.g. change http to https for the following 6 entries).

Am I correct that this is a mix of changes and additions or am I looking at the wrong reference file? For example, it looks like Overdrive was explicitly removed back in 2016 and some others like Swedish Libris don't appear in the file I'm looking at either.

tfmorris commented 6 years ago

OK, I finished slogging through all these and found most of them were either new entries or unchanged. I've created a PR for the few actual updates as outlined below.

url: https://alkindi.ideo-cairo.org/controller.php?action=SearchNotice&noticeId=@@@ website: https://www.ideo-cairo.org/ notes: name: dominican_institute_for_oriental_studies_library label: Al Kindi

Not found in file.

url: https://www.alibris.com/booksearch?qwork=@@@

name: alibris_id label: Alibris ID

Why do these have "ID" added?

url: https://www.amazon.com/gp/product/@@@ name: amazon label: Amazon.com

Left unchanged. URL matches. Label is Amazon ASIN

Amazon Canada, Germany, Italy, UK not added.

website: http://www.guidedogswa.org/library/openbiblio/shared/biblio_view.php?bibid=@@@&tab=opac notes: Still http name: abwa_bibliographic_number label: Association for the Blind of Western Australia

Not found.

website: http://www.bne.es/en/Catalogos/index.html notes: name: depósito_legal label: Biblioteca Nacional de España Depósito Legal

Not found.

website: http://catalogue.bnf.fr/ notes: Still http name: bibliothèque_nationale_de_france label: Bibliothèque Nationale de France

Not found.

url: https://bibsys-almaprimo.hosted.exlibrisgroup.com/primo_library/libweb/action/dlDisplay.do?vid=BIBSYS&docId=BIBSYS_ILS@@@ website: https://bibsys-almaprimo.hosted.exlibrisgroup.com/ notes: name: bibsys label: Bibsys ID

Not found.

label: Biodiversity Heritage Library

Switched to https

url: http://solo.bodleian.ox.ac.uk/OXVU1:LSCOP_OX:oxfaleph@@@ website: https://www.bodleian.ox.ac.uk/ notes: name: bodleian,_oxford_university label: Oxford University Bodleian Library Aleph System Number

Not found.

url: https://www.bookcrossing.com/journal/@@@ website: https://www.bookcrossing.com notes: name: bcid label: Book Crossing ID (BCID)

Not found.

url: http://booklocker.com/books/@@@.html website: http://booklocker.com/ notes: Still http name: booklocker.com label: BookLocker.com

Not found.

url: http://www.bookmooch.com/detail/@@@ notes: Still http name: bookmooch label: Book Mooch

Unchanged.

website: http://www.bookwire.com/ notes: name: bookwire label: Bowker BookWire

Not found.

url: http://www.booksforyou.co.in/@@@ website: http://www.booksforyou.co.in notes: Still http name: booksforyou label: Books For You

Not found.

url: https://bostonpl.bibliocommons.com/item/show/@@@ website: https://bostonpl.bibliocommons.com notes: name: boston_public_library label: Boston Public Library

Not found.

website: https://www.bl.uk/ notes: name: british_library label: British Library

Not found.

website: https://ecommons.cornell.edu/handle/1813/11665 notes: name: cornell_university_ecommons label: Cornell University ecommons

Not found.

website: https://newcatalog.library.cornell.edu/catalog/@@@ notes: name: cornell_university_library label: Cornell University Library

Not found.

website: notes: Session-based IDs name: canadian_national_library_archive label: Canadian National Library Archive

Not found.

url: http://d-nb.info/@@@ website: http://www.d-nb.de/eng/index.htm notes: Still http name: dnb label: Deutsche National Bibliothek

Unchanged.

url: http://zbc.ksiaznica.szczecin.pl/dlibra/docmetadata?id=@@@ website: http://zbc.ksiaznica.szczecin.pl notes: name: digital_library_pomerania label: Digital Library of Pomerania

Not found

url: http://www.discovereads.com/books/@@@ website: http://www.discovereads.com notes: name: discovereads label: Discovereads

Not found.

url: http://www.freebase.com/view/en/@@@ website: http://freebase.com/ notes: Defunct name: freebase label: Freebase

Unchanged.

url: https://www.goodreads.com/book/show/@@@ name: goodreads label: Goodreads

Unchanged.

url: https://books.google.com/books?id=@@@ name: google label: Google

Unchanged.

url: https://catalog.hathitrust.org/Record/@@@ website: https://hathitrust.org/ name: hathi_trust label: Hathi Trust

Unchanged.

url: https://hollis.harvard.edu/primo_library/libweb/action/display.do?doc=HVD_ALEPH@@@ website: https://library.harvard.edu name: harvard label: Harvard University Library

Updated.

url: https://ilmiolibro.kataweb.it/schedalibro.asp?id=@@@ website: https://ilmiolibro.kataweb.it notes: name: ilmiolibro label: Ilmiolibro

Not found.

url: https://archive.org/details/@@@ name: ocaid label: Internet Archive

Unchanged

url: http://www.isfdb.org/cgi-bin/pl.cgi?@@@ website: http://www.isfdb.org notes: Still http name: isfdb label: Internet Speculative Fiction Database

URL added. Website & label updated.

url: http://estc.bl.uk/@@@ name: etsc label: English Title Short Catalogue Citation Number

Not found.

name: isbn_10 label: ISBN 10

Unchanged.

name: isbn_13 label: ISBN 13

Unchanged.

website: http://www.issn.org/ notes: Still http name: issn label: ISSN

Unchanged.

url: https://lccn.loc.gov/@@@ name: lccn label: LC Control Number

Unchanged.

url: https://www.librarything.com/work/@@@ name: librarything label: Library Thing

Unchanged.

website: https://www.lulu.com

Updated.

url: http://www.magcloud.com/browse/Issue/@@@ website: http://www.magcloud.com notes: Self-publishing platform name: magcloud label: Magcloud

Not found.

url: https://id.ndl.go.jp/bib/@@@ website: https://ndlonline.ndl.go.jp notes: name: NDL Bibliographic ID label: National Diet Library, Japan

Not found.

url: https://catalogue.nla.gov.au/Record/@@@ website: https://www.nla.gov.au/ name: nla label: National Library of Australia

Switched to https.

url: https://libris.kb.se/bib/@@@ website: https://libris.kb.se notes: name: libris label: National Library of Sweden (Libris)

not found

url: https://www.worldcat.org/oclc/@@@?tab=details website: https://www.worldcat.org name: oclc_numbers label: OCLC/WorldCat

Website added.

url: https://www.overdrive.com/media/@@@ website: https://www.overdrive.com name: overdrive label: OverDrive

Not found (and explicitly removed in 2016).

url: http://www.paperbackswap.com/book/details/@@@ website: http://www.paperbackswap.com notes: Still http name: paperback_swap label: Paperback Swap

website added.

website: https://www.gutenberg.org

Updated

website: https://www.scribd.com

Updated

website: http://www.shelfari.com/ notes: Merged to goodreads.com

Note added.

url: https://www.smashwords.com/books/view/@@@ website: https://www.smashwords.com notes: Commission self-publishing platform name: smashwords_book_download label: Smashwords Book Download

Not found.

url: https://catalogue.libraries.london.ac.uk/record=@@@ website: https://catalogue.libraries.london.ac.uk/ notes: name: ulrls label: University of London

Not found

url: http://books.wwnorton.com/books/detail.aspx?id=@@@ website: http://wwnorton.com notes: Still http; name: w._w._norton label: W. W. Norton

Not found.

url: http://zdb-katalog.de/title.xhtml?ZDB-ID=@@@ website: http://zdb-katalog.de?lang=EN notes: The Zeitschriftendatenbank is the world’s largest specialized database for serial titles (journals, annuals, newspapers etc., incl. e-journals). name: zdb-id label: ZDB-ID

Not found.

url: https://kansalliskirjasto.finna.fi/Record/vaari.@@@ website: https://www.kansalliskirjasto.fi/ notes: The National Repository Library of Finland name: vaari label: Vaari

Not found.

url: https://fennica.linneanet.fi/vwebv/holdingsInfo?bibId=@@@ website: https://www.kansalliskirjasto.fi/ notes: The National Bibliography of Finland name: fennica label: Fennica

Not found.

url: https://opacplus.bsb-muenchen.de/metaopac/search?id=7670775 website: https://www.bsb-muenchen.de notes: name: bayerische_staatsbibliothek label: Bayerische Staatsbibliothek BSB-ID

Not found.

url: https://www.abebooks.de/servlet/BookDetailsPL?bi=@@@ website: https://www.abebooks.de notes: Bookseller network identifier for a specific copy name: abebooks.de label: Abebooks.de

Not found.

url: http://catalogo.bne.es/uhtbin/cgisirsi/x/0/0/57/5/3?searchdata1=@@@{CKEY}&user_id=WEBSERVER website: http://www.bne.es/en/Inicio/index.html notes: Still http name: depósito_legal label: Depósito Legal. Biblioteca Nacional de España

Not found.

url: http://search.bl.uk/primo_library/libweb/action/display.do?doc=BLL01@@@ website: http://www.bl.uk/bibliographic/natbib.html notes: Still http name: british_national_bibliography label: British National Bibliography system number

Not found.

url: http://catalogue.bnf.fr/rechercher.do?motRecherche=@@@ website: http://www.bnf.fr notes: still http, query for (e.g.) 39160616 returns MARC field 001 value FRBNF391606160000007 name: bibliothèque_nationale_defrance(bnf) label: Bibliothèque nationale de France (BnF)

Not found.

LeadSongDog commented 6 years ago

Boston Public Library appears to be https://bostonpl.bibliocommons.com/item/show/@@@075 for whatever reason.

tfmorris commented 6 years ago

Hmmm, looks like the version in git isn't actually the live version. That's kind of annoying. I guess someone else will have to deal with this.

LeadSongDog commented 6 years ago

Oh wait, is this live data gathered from contribs at https://github.com/internetarchive/openlibrary/blame/c4d877ee6410df6f70ab45718baebe52fdf366ba/openlibrary/templates/books/edit/addfield.html#L29

tfmorris commented 6 years ago

No, it's done by editing the page at https://openlibrary.org/config/edition -- except that editing is apparently currently broken.

We're going to move things back to git though. I'm going to delete the PR and start from scratch.

tfmorris commented 6 years ago

Oops! I'll leave this issue open, but delete the associate PR.

LeadSongDog commented 2 years ago

@mekarpeles I am pretty sure that /config/editions is not yet fixed. It seems not to admit corrective edits.

Something else weird is happening: Administrator has made a ridiculous number of edits to /config/edition with each appending one blank line. See the page’s edit history for the diffs.