psu-libraries / scholarsphere-3

A web application for ingest, curation, search, and display of digital assets. Powered by Hydra technologies (Rails, Hydra-head, Blacklight, Solr, Fedora Commons, etc.)
Apache License 2.0
78 stars 24 forks source link

Additional or Quality Metadata for Google Scholar (aka, The Scholar) #1602

Open mtribone opened 5 years ago

mtribone commented 5 years ago

We need to rework the metadata per the recommendations from Google in regards to scholarly literature.

Example Work that we would want to be indexed https://scholarsphere.psu.edu/concern/generic_works/xwd375x69k

Datasets should be found in Google Dataset Search. https://developers.google.com/search/docs/data-types/dataset

DanCoughlin commented 5 years ago

There are a couple of things that would be very helpful for indexing if possible:

A file with .csv at the end, which would be a red flag for the indexing system. It would not be considered a publication

It's better not to add empty metatags. If you don't have the publication date of an item, for example, it's best not to include the citation_publication_date tag.

Additionally, the journal specific information like vol/issue and page numbers can be left off for repository items. The really crucial items for ScholarSphere would be:

mtribone commented 5 years ago

Placeholder for questions to help solve the indexing issues

  1. Does each article require the abstract to be in a separate HTML or PDF file?
  2. What file formats are excepted? Seems like preservation formats are not the required file formats?
mtribone commented 5 years ago

Google Scholar view Partial https://github.com/psu-stewardship/scholarsphere/blob/2735de0c3994d2f0e32ac2ef5f5219af528f83d8/app/views/shared/_gscholar.html.erb

mtribone commented 5 years ago

Perhaps we should require the publication date if a work has the resource type of article, book, journal, part of book, research paper? This would require a change to the new/edit work form to pull the publication date out of the additional metadata. Or we could remove the citation_publication_date from the meta for Google Scholar if it is blank. Might be better to get a date.

mtribone commented 5 years ago

We will also need to redo Batch Create because the process uses the filenames as the title of the work. It does not remove the file extension from the title.