w3c / publishingcg

Repository of the Publishing Community Group
https://www.w3.org/community/publishingcg/
Other
18 stars 8 forks source link

Adding Metadata in Content Document (xHTML files) #54

Open vsingb3 opened 1 year ago

vsingb3 commented 1 year ago

I have an an application which runs in Browser and is similar to ePUB Reader. There is a requirement to add metadata tags in the content e.g. a paragraph in the HTML files.

I have reviewed the ePUB specification which talks about adding the metadata in the node of Package file (*.opf file). However it doesn't talk about how to tag the metadata to a specific content element (say

tag) in the content documents i.e. xHTML file.

Is there any specification / standard to add the metadata in the content documents (.xHTML files) ?

gauravchaddha commented 1 year ago

+1

apoorva-1199 commented 1 year ago

+1

jasdeep-compro commented 1 year ago

+1

chauhanvipul25 commented 1 year ago

+1

Nimit012 commented 1 year ago

+1

tina-compro commented 1 year ago

+1

mattgarrish commented 1 year ago

There are many ways to add metadata to HTML files, but as they're not specific to EPUB they are not detailed in the EPUB specification. Have a look at RDFa and microdata, in particular, if you need to annotate content in the body, and there's also JSON-LD that can be embedded in a script tag. One of those should meet your needs and all are valid in EPUBs.

vsingb3 commented 1 year ago

Hi @mattgarrish

Thank you for sharing the information. That was very helpful. I have a follow up query which I have stated below.

Lets say that I have following

tag in my HTML and I want to tag it with a skill "RL.2.1" as defined here: https://casenetwork.imsglobal.org/uri/6b33f0bc-d7cc-11e8-824f-0242ac160002

<p>
    Write about your school day.
</p>

They key consideration here is that the skill value "RL.2.1" should not be rendered/shown in the browser.

Microdata

As per Microdata, I believe I need add a child meta element as shown below

 <p itemscope>
     Write about your school day.
     <meta itemprop="commoncore-skill" content="RL.2.1" >
 </p>

RDFa

As per RDFa, I believe I need add a child meta element as shown below

 <p>
     Write about your school day.
     <meta property="commoncore-skill" content="RL.2.1" >
 </p>

In case multiple skills are to be tagged to this

element, I believe multiple elements can be added as shown below:

<!-- Microdata -->
 <p itemscope>
     Write about your school day.
     <meta itemprop="commoncore-skill" content="RL.2.1" >
     <meta itemprop="commoncore-skill" content="RL.2.2" >
 </p>
<!-- RDFa -->
 <p itemscope>
     Write about your school day.
     <meta property="commoncore-skill" content="RL.2.1" >
     <meta property="commoncore-skill" content="RL.2.2" >
 </p>

Is the above understand correct Or is there any other standard way for acheiving this ? Please advise.

As per HTML spec, there is a "data-*" custom attribute which can be used to add custom metadata on any HTML element.

<!-- data attribute -->
 <p data-commoncore-skill="RL.2.1, RL.2.2">
    Listen and sing.
 </p>

Which way would you recommend to move forward with.

mattgarrish commented 1 year ago

Which approach you use depends on what you need the metadata for.

If you're just adding the information for processing by another script you've written, I probably wouldn't use the formal metadata frameworks like RDFa or microdata. They can be more difficult to process.

If you want other applications to be able to harvest the information from the document (e.g., google search), using a standard framework will make that simpler.

vsingb3 commented 1 year ago

Hi @mattgarrish

Thank you for your recommendation. Could you please confirm following 2 points:

  1. They way I implemented RDFa & Microdata standards in my previous comment, was it correct ?
  2. In our case, we will be writing our own scripts to implement Search & Faceted Filter feature in our application. This metadata information is not required to be read by any other application (e.g. Google Search). So what I understand from your comment is that I should go with **data-*** attributes. Please confirm.
vsingb3 commented 1 year ago

Hi @mattgarrish

Let me know your thoughts on the above comment.

mattgarrish commented 1 year ago
  1. hey way I implemented RDFa & Microdata standards in my previous comment, was it correct ?

On a quick glance, you're not saying what the metadata is about.

You've only set an itemscope for the microdata, for example. You need either an itemtype or an itemref to say what it is about (the subject of the triples). RDFa doesn't have itemscope. For it, you need to set typeof and/or resource.

You can test your markup using Google's tools: https://developers.google.com/search/docs/appearance/structured-data

So what I understand from your comment is that I should go with data-* attributes. Please confirm.

No, I was only saying that data- attributes can be simpler to author and process than RDFa and microdata. There is a lot of expressive power in those frameworks and I don't know if you need it all. If you're not interested in creating graphs and triples from the data, it might be simpler to create a slimmed down markup model using data- attributes that you can parse more easily. Only you can answer which provides the information you to best suit your application.