Issues with input and display of keywords

Ph-We commented 8 years ago

Hi!

Here are the issues found for keywords functionality:

The keyword fields are not labeled according to their locales (in multilingual journals) (#2499 and #2510)
Creation of a new keyword is triggered by ENTER or the comma sign (“,”). But the latter is not bound to the same key in different keyboard layouts. For example, in the Russian layout there is a “б” character instead of the comma, so every time I enter “б”, a new keyword is created.
Keywords entered are not included as metadata (DC/GS metatags) on article pages. (https://github.com/pkp/ojs/pull/1518)
There is no option to display keywords on the article page. (https://github.com/pkp/ojs/pull/1518)

http://forum.pkp.sfu.ca/t/ojs3-keywords-are-not-displaying-anywhere/20125/3

ajnyga commented 7 years ago

Thanks!

After realizing that the OJS3 subject field is (or should be) the same as subjectClass field, I would vote for having two separate fields and in my opinion you can still combine these in OAI-PMH. If someone needs a more refined output there, it is easy to add a new custom oaimetadataformat.

However I want to stress the difference between a library classification and thesaurus. Both are controlled vocabularies but inherently different. The latter would be something that you would want to integrate to the keywords field but at the same time it would be a good idea to permit adding keywords from outside the thesaurus - and make a distinction between the words coming from the thesaurus and the ones that are outside the thesaurus.

This leads me to the idea of saving keyword/subject URIs/ID's - what do you think @asmecher, how would the current controlled vocabulary function in OJS3 handle this?

Ph-We commented 7 years ago

I would agree with @ajnyga. Roughly speaking, 'subjects' are related to 'keywords' as 'taxonomies' to 'folksonomies'. So managers might want to control both, but using different methods. (E.g. controlled vocabularies for subjects and (usually more complex) thesauri for keywords, allowing free input).

P.S. Again, AFAIK 'subjects' and 'keywords' are mostly confused in DC/GS metatags (there are no 'keywords' in DC and 'subjects' in GS).

ajnyga commented 7 years ago

With saving the URIs (or other identifiers) with the keyword, I was thinking something like this: screenshot_26

asmecher commented 7 years ago

Correct me if I'm wrong but I think we're all agreed on an approach -- keep both fields, and provide optional tools to curate them, particularly Subjects.

@ajnyga, on storage: would you be using a recognized third-party standard, or something home-grown? If it's a third-party standard, it might be better to have the controlled vocabulary maintained in an XML file or similar, rather than backing it in the database. Then the submissions could refer to an identifier only (e.g. URL) and have the translation fetched from the XML.

ajnyga commented 7 years ago

Hi,

At least in my opinion we should keep both, and provide the optional tools as well to both fields (and maybe also to Coverage etc.). I am also still voting for an easy REST API integration.

@asmecher, the vocabulary I am planning to use is this https://finto.fi/yso/en/ It is an ontology that the National Library is hoping us to use in our journals. The XML format of that ontology is around 40 megabytes, so that would probably be a problem. Also that would probably mean a lot of hacking to the code that shows the keywords in the frontend, backend and APIs.

Saving an URI together with a keyword would basically mean saving an unique identifier for that word, which I think is a good practice. But it of course doubles the data and I want to stress that I am not a library expert - maybe someone from PKP with this expertise would have a better view.

I have a backup plan to store the uri like this: Keyword [http://uri.com/path/to/keyword] I am then parsing that with regexp in our own OAImetadataformat plugin to find out if the keyword is coming from the vocabulary. If it is, it will get a different value in Marc and will include the URI in a separate field. The problem with that solution is of course that the URI is shown in every other output. In OJS2 which used the submission_settings table to store these I had solution for this, but it does not work the controlled vocabularies model.

Ph-We commented 7 years ago

As to the standards, AFAIK, SKOS is the only standard recommended by W3C for maintaining vocabularies/thesauri. YSO is based on SKOS too. Functionality to save appropriate URIs along with keywords is the least one could do.

marcbria commented 7 years ago

it might be better to have the controlled vocabulary maintained in an XML file or similar, rather than backing it in the database.

What if you have multiple journals with different thesaurus? If is based in a xml file, they need to be stored in the journal's registry folder or so, isn't it?.

asmecher commented 7 years ago

My worry with backing a vocabulary in a database is that any 3rd-party standard is going to evolve -- loading it into the database (possibly in translations) will mean having to maintain changes there, and more often than not, deal with obsolete lists. The country/language/currency lists are a good parallel -- we've always kept those in XML and it's made maintenance way easier.

Thanks for the heads-up on SKOS -- I wasn't aware of it!

ajnyga commented 7 years ago

It is likely that a thesaurus or ontology will evolve. However, I do think that the URIs in thesaurus like YSO are always persistent identifiers, meaning that they will resolve somewhere - keywords/URIs are not just dropped. Also, OJS users are backing vocabularies to the database all the time, they just do not have a technical integration between a vocabulary and OJS.

The XML solution is a good one in many cases, but do you think that it will work with 40 megabyte files? One solution would be to store just the URI and then call for the label/keyword with the API. But that would result in to a lot of API calls which would could be a problem as well.

asmecher commented 7 years ago

@ajnyga, I don't know that standard at all, but at a glance it looks like it contains the meat of the dictionary plus a lot of tangentials (deprecated concepts, various metadata, etc). This reminds me a little of the ONIX schema we include with OMP -- in that case we used a few different tools to make an unwieldy set of XML into something usable: IIRC an XSL to strip out some unwanted elements, and some PHP code to cache specific sub-elements of the larger set.

Generally nothing involving ONIX is free of tears, but this approach allowed us to ship with the "official" ONIX code list, without causing performance problems in parsing. Caching is handled via file caches, so if the ONIX code list ever gets updated, the file caches will be automatically invalidated. And no need to sync external standards against a set managed in the database, which is always painful.

ajnyga commented 7 years ago

Thanks. Yes, it is a ontology, so not just about the words but more about the relationships between the words.

I have to think about the your solution with ONIX. This would require a lot of other changes as well, I mean fetching words from XML in the frontend and OAI etc. Also this would require regular updates of the local XML file.

For a quick solution I will probably just go with saving the keywords together with the URI. I can always convert these later with a custom script if another solution is available, because they include the URI part.

asmecher commented 7 years ago

Sounds good, @ajnyga. There may well be elements that'll be difficult to implement purely by plugin -- I haven't considered this in detail yet -- and if that's the case, changes to the core codebase may be needed. If so, I think these would be well worth integrating because this is a worthwhile feature. That said, I do think the current behavior is likely to remain as the default, because of the difficulty in identifying a good, free, broadly-applicable, widely-translated vocabulary.

bozana commented 7 years ago

@asmecher, the keywords migration from OJS 2.4.x should then be fixed -- they should be saved as keywords and not as subjects, right? What about the subject classification migration? -- In OJS 2.4.x we had the name and URL of subject classification, thus should this be somehow migrated too or should just the entries be migrated to subjects as they are?

asmecher commented 7 years ago

@bozana, there's some confusion in terminology because the OJS2 "subjects" (per the code) field is the "keywords" field (per the UI/locales). So the OJS2 to OJS3 upgrade should migrate "subjects" (per the code) to OJS3's "keywords", and "subjectClass" (per the code) to "subjects".

bozana commented 7 years ago

OK, that's what I meant :-) I would open a new issue just for that (migration issue) ok?

asmecher commented 7 years ago

Sure, please do.

asmecher commented 7 years ago

@bozana, are you still working on this one?

bozana commented 7 years ago

@asmecher, I never did -- I just fixed the migration issue. From the original post: 1) is solved, 2) I am not sure we can do anything about, 3) I could do, 4) I believe @NateWr wanted to work with the display. The further requirements like use of dictionaries etc I would defer and put in a new issue. What do you think?

asmecher commented 7 years ago

@NateWr, does @bozana's suggestion work for you? Would you like to put some effort into keyword display for 3.1?

bozana commented 7 years ago

@NateWr, I can also do 4) if you would tell me where and how to display them best :-)

NateWr commented 7 years ago

@bozana That'd be great. I'd think they'd be best just below or above the abstract. You should be able to do it with the following HTML, requiring no additional styling.

<div class="item abstract">
  <p><strong>Keywords:</strong> <a href="#">first</a>, <a href="#">second</a>, <a href="#">third</a></p>
</div>

Ph-We commented 7 years ago

I would vote either for the 'above the abstract' option, or for putting keywords somewhere between the galley and 'published' blocks. The abstract may be too long, and keywords may usually describe the article 'in a few words'. So it would be better for them to be visible at first glance.

marcbria commented 7 years ago

I arrive late to this thread... so sorry in advance if I'm only adding noise to the discussion.

@NateWr I have doubts here:

Is "strong" the right tag for the label? https://www.w3schools.com/tags/tag_strong.asp
Should be better (for theming) build a ul-li list for each keyword?
Why not adding "rel=tag"?

I found this discussion... but unfortunately it's really old: https://stackoverflow.com/questions/12866008/html5-semantic-markup-for-blog-post-tags-and-categories

Did you search for "best practices" (or "patterns" or "components" or whatever they like to call them) to see how others are dealing with this. It's very close to a blog's tag so from an html perspective, we can adopt the most extended solution, don't you think?

Vitaliy-1 commented 7 years ago

Google is not indexing keywords at all. So they are needed mostly for authors for better navigation inside the journal.

2017-08-22 18:38 GMT+03:00 Marc Bria notifications@github.com:

I arrive late to this thread... so sorry in advance if I'm only adding noise to the discussion.

@NateWr https://github.com/natewr I have doubts here:

Is "strong" the right tag for the label? https://www.w3schools.com/ tags/tag_strong.asp

Should be better (for theming) build a ul-li list for each keyword?

Why not adding "rel=tag"?

Did you search for "best practices" (or "patterns" or "components" or whatever they like to call them) to see how others are dealing with this. It's very close to a blog's tag so from an html perspective, we can adopt the most extended solution, don't you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pkp/pkp-lib/issues/1828#issuecomment-324065965, or mute the thread https://github.com/notifications/unsubscribe-auth/ATtU8_zXWq2iYQF0z5qAlMpGFiGEAfjWks5savYLgaJpZM4KALVi .

NateWr commented 7 years ago

Is "strong" the right tag for the label? https://www.w3schools.com/tags/tag_strong.asp

I suppose it probably would be good to follow the .item, .label, .value pattern we've already established:

<div class="item doi">
    <span class="label">
        Keywords:
    </span>
    <span class="value">
        first, second, third
    </span>
</div>

But we'll need to update the CSS for the default theme and defaultManuscript to place them inline (like the DOIs). (@bozana, you can give me a nod when you want me to do that.)

Should be better (for theming) build a ul-li list for each keyword?

The keyword data should be available for themers to build their own HTML output.

Why not adding "rel=tag"?

I jumped the gun above. We won't actually have a page for each keyword to view articles in that keyword in 3.1. So there won't be any links there.

It's worth considering when we do have keyword-based browsing, but we'll need to consider other grouping strategies (sections, subjects) before settling on a rel for any particular browsing approach.

NateWr commented 7 years ago

Did you search for "best practices" (or "patterns" or "components" or whatever they like to call them) to see how others are dealing with this

I did look at eLife, PlosOne and UbiquityPress. Only Ubiquity was even displaying keywords, so they don't seem to be a high priority, and Ubiquity is probably just displaying them because they were part of OJS 2. Still, they're doing <label>: <list of keywords>.

Ph-We commented 7 years ago

@Vitaliy-1

Google is not indexing keywords at all. So they are needed mostly for authors for better navigation inside the journal.

I think Google indexes keywords, but they do not use them for ranking pages indexed. Google Scholar should index both Dublin Core and HighWire (GS) tags. At least, Arlitsch & OBrien recommend adding the appropriate metatags, and they did a lot of research in this field :) https://books.google.ru/books/about/Improving_the_Visibility_and_Use_of_Digi.html?id=KxKSAwAAQBAJ&redir_esc=y

UPD: I double checked the document, Google sent to us: 'ADDING MACHINE READABLE BIBLIOGRAPHIC METADATA TO SCHOLARLY ARTICLES' It contains recommendations to add keywords as metadata:

marcbria commented 7 years ago

About "strong", I'm happy with @NateWr new proposal (without "strong" ;-) )
About html structure based on ul-li, it's for making theming easier... and progressively move to something more standard (BAM?).
About rel="tag", I just forget out keywords aren't linked to a key-page so actually didn't make much sense microformats (ToDo: link keywords :-) )

BTW, taking a look to Plos and UbiquityPress (this last one is tricky because as you said is an OJS, so is like looking ourselves) is good but, apart of our small academic universe, we also need to keep an eye on cutting edge cms (like wordpress, drupal, joomla...) to see what are becoming a new standards.

@NateWr I know you are very familiar with wordpress, so if you think is not the right moment to change our ojs' list-items-pattern I'm ok with your decision.

bozana commented 7 years ago

PR for 3) and 4) from the original issue: pkp-lib master: https://github.com/pkp/pkp-lib/pull/2764 ojs master: https://github.com/pkp/ojs/pull/1518

bozana commented 7 years ago

@NateWr, I displayed the keywords in front of abstract and in the same way and size as DOIs. If I should change that, just tell me... Could you review the PR above? It also solves the other issue mentioned here: to consider keywords for DC and GoogleScholar indexing.

bozana commented 7 years ago

I think I can close this issue -- if something from the discussion should be addressed later, maybe to open a new issue?

pkp / pkp-lib

Issues with input and display of keywords #1828