dev: Missing and wrong information in Agenda view

zuphilip commented 6 years ago

I add a new resource by DOI 10.1177/1468796807080237 and switched to the resolve tab. This new resource in the Agenda view (currently at the bottom) as a Journal Article and Journal Issue:

locdb-agenda-missing-info

But the information about the journal name, volume number, issue number is missing. Moreover, the journal issue has the date of today which is clearly wrong. The authors and publisher information is missing as well. BTW the urls shown here seem for me unnecessary, no?

LauraErhard commented 6 years ago

I just wrote an issue on this as well #276

lgalke commented 6 years ago

We already have URL/URI identifiers on track, we might hide them and only show specific identifiers, or incorporate them in some link element instead. Regarding the date, this is hopefully a minor bug initializing the date with today and not already wrong in the database. We will trace that down.

Displaying the Journal metadata in the Journal issue is more challenging, I assume that the data of the journal is indeed present. Yet the mapping from any "in collection" type to their collection type is not functional. We can do a lucky guess, which would solve it for Journal Issues specifically, but not in the general case.

Here is the mapping from child to parent resource types which we intend to use (provided by @kleinann)

      switch (type) {
          case resourceType.monograph || resourceType.editedBook || resourceType.book || resourceType.referenceBook:
              return [resourceType.bookSet, resourceType.bookSeries];
          case resourceType.bookSet:
              return [resourceType.bookSeries]
          case resourceType.bookChapter || resourceType.bookSection || resourceType.bookPart || resourceType.bookTrack || resourceType.component:
              return [resourceType.editedBook, resourceType.book, resourceType.monograph]
          case resourceType.proceedingsArticle:
              return [resourceType.proceedings]
          case resourceType.journalArticle:
              return [resourceType.journalIssue, resourceType.journalVolume, resourceType.journal]
          case resourceType.journalIssue:
              return [resourceType.journalVolume, resourceType.journal]
          case resourceType.journalVolume:
              return [resourceType.journal]
          case resourceType.report:
              return [resourceType.reportSeries]
          case resourceType.referenceEntry:
              return [resourceType.referenceBook]
          case resourceType.standard:
              return [resourceType.standardSeries]
          case resourceType.dataset:
              return [resourceType.dataset]
          default:
              // return as default just book -- or []
              return []
      }

Until now we hesitated to implement it since we lacked a good heuristic. As you can see even for Journal Issues it is not clear, whether we need to show the metadata of Journal Volume or of the Journal altogether. Developing such a heuristic will be among the next steps.

lgalke commented 6 years ago

For instance, we could traverse the list of valid container types left to right (from preferred to least preferred) and signal a hit, as soon as the title property is present. Would be helpful to have your thoughts on this.

kleinann commented 6 years ago

I'm not sure if I get the problem. I thought that in the data model discussion, we decided to implement only 2 levels of hierarchy; for journal articles, this would be journal article and journal issue; for collections, book-chapter and edited book. Logically, all the information that applies to hierarchical levels above the journal article or book chapter itself should be in journal issue or edited book. Only for the export to the OpenCitations Model, the data that we keep in our upper hierarchical level must be split up into several elements. Did I get this wrong, Anne? Or is there a problem that I'm missing here?

lgalke commented 6 years ago

@kleinann it is a quite technical problem, but not easy to solve. It is true that we have only the two levels of hierarchy, but on both level we have about 150 different properties (such as journalArticle_title, journalIssue_title, journal_title). This raises a problem when the container meta-data should be displayed.

For example: we have a JOURNAL_ARTICLE on the lower level and hence display the property subset that belongs to JOURNAL_ARTICLE.

On the upper level, we have JOURNAL_ISSUE, for which we display also the associated properties. Now the problem is that these subsets of properties associated to JOURNAL_ISSUE may be empty, since the correct metadata e.g. for JOURNAL_VOLUME or JOURNAL is located at different fields. That's the point where we now have to make a guess (see above).

We now traverse the hierarchy of properties upwards until we find a non-empty title property. Currently, we only have 2 levels of resources, but the container resource still contains a hierarchy of properties, so to say, which we need to traverse until we find something... (just implemented)

zuphilip commented 6 years ago

I don't understand what you have to guess here, because we are speaking about our ingested data in our data model, which we fully control ourselves. As for practical implementation I think you have to add all information from every hierarchical level, i.e. the container resource in the frontend should contain the journal title from the JOURNAL, the volume number form the JOURNAL_VOLUME and the issue number form the JOURNAL_ISSUE (maybe more).

lgalke commented 6 years ago

Good point, we actually need such a concise list of relevant properties vor each Container Type. Maybe, we could have a dedicated Session regarding this mapping at some point?

zuphilip commented 6 years ago

I guess that we already have these properties implemented in the backend. @anlausch Is there a pointer to code where one can see the properties which are implemented for each container type?

kleinann commented 6 years ago

There was once a Hackpad with a draft of the properties for the new data model that Anne shared with me. @anlausch - if you implemented it like that, this would maybe still do the job? Like Philipp, I think that it should be possible to do an exact mapping.

anlausch commented 6 years ago

In the back end we implement more or less all properties for each type as we discussed the specific hierarchy only for journal_article (journal_issue, journal_volume, journal respectively). For other types you told me that a) the hierarchy is not clear b) no specific properties where mentioned. Having only parent and child resource does not mean that we do not need to curate the whole hierarchy. Most of the properties are anyways needed for everything, e.g. contributors, identifiers, title, embodiment etc. If you think that this is wrong, we can of course just delete those. The model is here: https://github.com/locdb/loc-db/blob/datamodel/api/schema/bibliographicResource.js . Let me know what you think.

anlausch commented 6 years ago

Regarding wrong information I will of course check what's going on.

kleinann commented 6 years ago

@anlausch - I couldn't find the "pages" (or firstpage / lastpage) property in journal article, book chapter, book section, book part, book track, component, proceedings article, dataset and reference entry. Is it missing, or did I just look at the wrong place?

anlausch commented 6 years ago

The properties for the pages are part of the resource embodiment. Therefore they can be found in the type-specific embodiment. The definition looks like this:

const resourceEmbodimentSchema = new Schema({ // Resource Embodiment
    identifiers: [identifiersSchema],
    type: {type: String}, // digital or print
    format: String, // IANA media type
    firstPage: Number,
    lastPage: Number,
    url: String,
    scans:[scanSchema]
});

lgalke commented 6 years ago

They are primarily used for our self-ingested scans, right? Or do you also fill these values from external metadata sources?

anlausch commented 6 years ago

I also try to fill those in case the data is available.

anlausch commented 6 years ago

Regarding the wrong date: I checked the raw data again (hope I found the right entry) and there is no date given. Maybe the problem is then in the front end?

lgalke commented 6 years ago

Will check in the front-end, it is most likely then initialized with "today", when no information is given. We will fix that.

lgalke commented 6 years ago

Schedule:

[x] Dates
[x] Author, editor, and publisher names
[x] Flattened information of containers (as specified in hackmd)

LauraErhard commented 6 years ago

I just uploaded a Sage journal article and the metadata is not what I expected: 2018-05-28 doi_upload Shouldn't there be more information in the journal issue part?

The journal article has the information "In: Gender & Societ; 1, SAGE Publications; 2" But the journal is called Gender & SocietY, is there a limit on characters there? Then I am not sure what the 1 and the 2 stand for. The journal is in Volume 25, Issue 1. Is the 5 from 25 just cut?!

anlausch commented 6 years ago

As it can be seen here https://locdb.bib.uni-mannheim.de/locdb-dev/bibliographicResources/5b0bd07093d5536341f88224 the journal issue has number and volume as expected. Additionally, it has two issn's for the journal. My guess would also be a character limit on the fields in the front end.

lgalke commented 6 years ago

We fixed the remaining problem of some missing characters in some of the fields. Everything should be finished now, if not please raise a new, specific issue.

locdb / locdb-frend

dev: Missing and wrong information in Agenda view #275