postman-open-technologies / schema-org-openapi-catalog

Schema.Org OpenAPI Catalog (GSoC 2023)
Apache License 2.0
7 stars 1 forks source link

Identify your chosen source of input data #1

Open MikeRalphson opened 1 year ago

MikeRalphson commented 1 year ago

From https://github.com/schemaorg/schemaorg/tree/main/data/releases/15.0

Examine the various formats in which the schema.org Types are described.

Which format would be best to work from programmatically? Are the various formats equivalent? What leads you to those conclusions?

Can you identify the data which describes the Book Type?

Himanshu-Dedha commented 1 year ago

In https://github.com/schemaorg/schemaorg/tree/main/data/releases/15.0, there are various formats in which the schema.org Types are described, such as JSON-LD, RDFa, Microdata, Turtle, N-Triples and Quads.

The best format to work from programmatically may depend on the specific use case and tooling available. However, in my opinion, JSON-LD would be the best suited from a programming perspective, because JSON-LD provides a clear and intuitive syntax (because of the key-value pairs that are easy to read). The JSON-LD format provides a standardized way to express semantic information using JSON syntax, making it easy to parse and deserialize. Microdata and RDFa(and other types), on the other hand, are markup languages that are used to add semantic information to HTML documents. While they can be used to represent data, they are not as well-suited to serialization and deserialization as JSON-LD. Microdata and RDFa require more complex parsing and are less intuitive to work with than JSON-LD, which can make deserialization more difficult.

The various formats are equivalent in terms of the information they contain, but they differ in the way they are represented. We can use whatever type suits our needs.

Also, I'm a little confused by "the data which describes the Book Type", does this mean the expected datatype for the properties of the Book? Assuming yes, the various data types of the various properties of the Book Type are shown in the table below: Property Data Type Description
abridged Boolean Indicates whether the book is an abridged edition or not
bookEdition Text The edition of the book
bookFormat BookFormatType The format of the book
illustrator Person the illustrator of the book
isbn Text The ISBN of the book
numberOfPages Integer The number of pages in the book

Note The data types of the properties inherited from CreativeWork and Thing have not been included in the table above.

MikeRalphson commented 1 year ago

@Himanshu-Dedha thank you for your detailed response! I would encourage you to formally apply to GSoC for this project. https://summerofcode.withgoogle.com/

Himanshu-Dedha commented 1 year ago

Hello @MikeRalphson ! I had a doubt... So for Schema.org types, there are some mandatory fields, like we have name and image for Organization type, so when we map the schema.org properties, we'll have to map the required fields as well, right?

MikeRalphson commented 1 year ago

If you identify required fields in schema.org Types, yes the required array should be populated in the output.

Himanshu-Dedha commented 1 year ago

Also, one more thing I had to ask you, in this project, when we convert schema.org Types in JSON-LD format to OpenAPI specification... we're losing all the semantic information, right? So would it be right to interpret this as filtering out the semantics i.e. converting the JSON-LD to JSON, then developing a JSON schema for validation and then converting this JSON schema to an OpenAPI specification?

MikeRalphson commented 1 year ago

What do you mean by all the semantic information?

Himanshu-Dedha commented 1 year ago

By semantic information, I mean the context that is provided in JSON-LD which is used to map terms used in a JSON-LD document to a vocabulary of terms, i.e. schema.org here.

Himanshu-Dedha commented 1 year ago

So what I meant by converting JSON-LD to JSON was removing the context from JSON-LD. Is that right? Or am I making a mistake?

MikeRalphson commented 1 year ago

Can you provide an example of a property you think will be lost?

Himanshu-Dedha commented 1 year ago

Sure, So here's an example of Book type in JSON-Ld format: { "@context": "http://schema.org", "@type": "Book", "name": "Random Book", "author": { "@type": "Person", "name": "Random Person" }, "publisher": { "@type": "Organization", "name": "The Random Books" }, "isbn": "9780330508567537" }

So JSON-LD is the superset of JSON i.e. it contains some extra information that JSON doesn't, which here are @context and the @type fields So if we removed the context field from JSON-LD, the type field would lose the information of what type it was actually referring to. So the resulting JSON from the above JSON-LD would be: { "name": "Random Book", "author": { "name": "Random Person }, "publisher": { "name": "The Random Books" }, "isbn": "9780330508567537" } The @context and @type fields have been removed because they are specific to JSON-LD format. However, I do think that @type can have a separate field to implement modularity in JSON schema and the OpenAPI specifications.

MikeRalphson commented 1 year ago

There are ways to express @context and @type in JSON schema native concepts. Please continue to think on these points!

Himanshu-Dedha commented 1 year ago

So I've been going through the JSON Schema documentation for hours now, and I might have understood what mistake I was making before, so JSON-LD is the super-set of JSON, and since JSON Schema is for validation of a JSON object, I assumed that validating JSON-LD with JSON Schema would result in loss of data i.e. @context or the semantics of the data. But I've just realized that @context can be validated as just any another key. we can describe @context in the following way : "@context": { "type": "string", "format": "regex", "pattern": "http://schema.org" } Note: Used Schema available for Schema.org on Schema Store

pragya-20 commented 1 year ago

Hello @MikeRalphson, you can find my inputs for this issue below : I have gone through this https://github.com/schemaorg/schemaorg/tree/main/data/releases/15.0 and I found 5 file formats used to define the schema markup which is:

On a broader view, there are 3 file formats: JSON-LD, Microdata, and RDFa to write schema markup. Frankly saying, when I was exploring schema.org then I found JSON-LD easier to understand, and implement than other formats like a turtle and RFD/XML. Also, after reading many documentations also, JSON-LD is still a better option to use programmatically because of the following reasons:

  1. It does not affect the performance of the page because it can be loaded asynchronously
  2. Flexible, JSON-LD can be used in various places like in APIs and data exchange platforms white microdata and RDFa are tied with HMTL markups which make it complex to use and limit the ability to use in other contexts.
  3. Widely used, it’s used by many developers so one can easily find resources and tools to work with
  4. Interoperable, as it provides a way to link different data sources by allowing you to define relationships between entities.

All the file formats are not exactly equivalent, and while all of these formats are used to represent data using Schema.org vocabulary and describe structured data, they have different syntax, structure, and characteristics. They are suitable for their specific use cases such as: JSON-LD is well-suited for use in web applications that need to exchange structured data over the internet, as it is easy to parse and generate using JavaScript, while Microdata and RDFa are more closely tied to HTML and are often used for adding additional information to the HTML code of a web page to provide more context and meaning to the content.

For the data which describes the Book type, properties are the ones that describe the data of the book which are below: Property Data Type Description
bookEdition Text The edition of the book
bookFormat BookFormatType The format of the book
abridged Boolean Indicates whether the book is an abridged edition
illustrator Person The illustrator of the book
isbn Text The ISBN of the book
numberOfPages Integer The number of pages in the book