locdb / locdb-frend

Fr(ont-)end for the Linked Open Citation Database.
https://locdb.github.io
GNU General Public License v3.0
6 stars 2 forks source link

Contributers in Bibliographic Resource #58

Closed LauraErhard closed 7 years ago

LauraErhard commented 7 years ago
  1. In the data for the individual entries (“Bibliographic Resource”) the "contributors" are not bold, like the other categories.
  2. For the data re-use, the contributor name should be standardized. Should the name be split into family name and given name? Should there be a hint how to write the name in (family name, given name)? Should there be a link to the integrated authority file (GND = Gemeinsame Normdatei)?
  3. Sometimes the PDFs contain the family name completely capitalized hence the OCR copies this spelling and has to be corrected manually. Is it possible to transform the names into the regular spelling (first letter capital, rest small)?
lgalke commented 7 years ago

Thanks for reporting the issue. The bold emphasis of the field should not be a problem. Splitting into family name and given name is indeed a good idea. The linking to authority files would be also a great feature. For now we only store raw text values without any links. Implementing the linking is probably out-of-scope for the workshop milestone, since it would also require searching all authors that are already stored in the LOCDB system in the same way we do it now with resources.

lgalke commented 7 years ago

Considering proper capitalization, it would probably be cleaner if it happens before the extracted data is passed to the front-end: either directly in the OCR component or the back-end. What do you think @anlausch ?

zuphilip commented 7 years ago

Just a note that this is probably more complicated: For example the author DENYS DE LA PATELLIÈRE or Spanish authors can have several first names and several last names and on the other hand LIGO Scientific Collaboration appears as authors of several papers.

Thus, I suggest to try to have a good automatic heuristic about lower/uppercase. The remaining errors can be corrected manually. But I wouldn't do anything about splitting into first and last name. Hopefully we can link to the correct publications (via CrossRef, OLC-Contents, our own DB, OpenCitations, WikiCite) where the metadata is hopefully entered correctly.

lgalke commented 7 years ago

Thats a good point @zuphilip . Plus, searching in the internal/external data sources is typically not case-sensitive. Thus when there is any matching resource, it should not be a problem. When we create a new resource from OCR data, it is probably okay to let the librarian adjust the proper spelling/capitalization. If heuristics are still desired, the OCR component is probably the right spot.

lgalke commented 7 years ago

Author disambiguation is another big topic which might be out of project scope.