Open MikeRalphson opened 1 year ago
In https://github.com/schemaorg/schemaorg/tree/main/data/releases/15.0, there are various formats in which the schema.org Types are described, such as JSON-LD, RDFa, Microdata, Turtle, N-Triples and Quads.
The best format to work from programmatically may depend on the specific use case and tooling available. However, in my opinion, JSON-LD would be the best suited from a programming perspective, because JSON-LD provides a clear and intuitive syntax (because of the key-value pairs that are easy to read). The JSON-LD format provides a standardized way to express semantic information using JSON syntax, making it easy to parse and deserialize. Microdata and RDFa(and other types), on the other hand, are markup languages that are used to add semantic information to HTML documents. While they can be used to represent data, they are not as well-suited to serialization and deserialization as JSON-LD. Microdata and RDFa require more complex parsing and are less intuitive to work with than JSON-LD, which can make deserialization more difficult.
The various formats are equivalent in terms of the information they contain, but they differ in the way they are represented. We can use whatever type suits our needs.
Also, I'm a little confused by "the data which describes the Book Type", does this mean the expected datatype for the properties of the Book? Assuming yes, the various data types of the various properties of the Book Type are shown in the table below: | Property | Data Type | Description |
---|---|---|---|
abridged | Boolean | Indicates whether the book is an abridged edition or not | |
bookEdition | Text | The edition of the book | |
bookFormat | BookFormatType | The format of the book | |
illustrator | Person | the illustrator of the book | |
isbn | Text | The ISBN of the book | |
numberOfPages | Integer | The number of pages in the book |
Note The data types of the properties inherited from CreativeWork and Thing have not been included in the table above.
@Himanshu-Dedha thank you for your detailed response! I would encourage you to formally apply to GSoC for this project. https://summerofcode.withgoogle.com/
Hello @MikeRalphson ! I had a doubt... So for Schema.org types, there are some mandatory fields, like we have name and image for Organization type, so when we map the schema.org properties, we'll have to map the required fields as well, right?
If you identify required fields in schema.org Types, yes the required
array should be populated in the output.
Also, one more thing I had to ask you, in this project, when we convert schema.org Types in JSON-LD format to OpenAPI specification... we're losing all the semantic information, right? So would it be right to interpret this as filtering out the semantics i.e. converting the JSON-LD to JSON, then developing a JSON schema for validation and then converting this JSON schema to an OpenAPI specification?
What do you mean by all the semantic information?
By semantic information, I mean the context that is provided in JSON-LD which is used to map terms used in a JSON-LD document to a vocabulary of terms, i.e. schema.org here.
So what I meant by converting JSON-LD to JSON was removing the context from JSON-LD. Is that right? Or am I making a mistake?
Can you provide an example of a property you think will be lost?
Sure, So here's an example of Book type in JSON-Ld format: { "@context": "http://schema.org", "@type": "Book", "name": "Random Book", "author": { "@type": "Person", "name": "Random Person" }, "publisher": { "@type": "Organization", "name": "The Random Books" }, "isbn": "9780330508567537" }
So JSON-LD is the superset of JSON i.e. it contains some extra information that JSON doesn't, which here are @context and the @type fields So if we removed the context field from JSON-LD, the type field would lose the information of what type it was actually referring to. So the resulting JSON from the above JSON-LD would be: { "name": "Random Book", "author": { "name": "Random Person }, "publisher": { "name": "The Random Books" }, "isbn": "9780330508567537" } The @context and @type fields have been removed because they are specific to JSON-LD format. However, I do think that @type can have a separate field to implement modularity in JSON schema and the OpenAPI specifications.
There are ways to express @context and @type in JSON schema native concepts. Please continue to think on these points!
So I've been going through the JSON Schema documentation for hours now, and I might have understood what mistake I was making before, so JSON-LD is the super-set of JSON, and since JSON Schema is for validation of a JSON object, I assumed that validating JSON-LD with JSON Schema would result in loss of data i.e. @context or the semantics of the data. But I've just realized that @context can be validated as just any another key. we can describe @context in the following way : "@context": { "type": "string", "format": "regex", "pattern": "http://schema.org" } Note: Used Schema available for Schema.org on Schema Store
Hello @MikeRalphson, you can find my inputs for this issue below : I have gone through this https://github.com/schemaorg/schemaorg/tree/main/data/releases/15.0 and I found 5 file formats used to define the schema markup which is:
On a broader view, there are 3 file formats: JSON-LD, Microdata, and RDFa to write schema markup. Frankly saying, when I was exploring schema.org then I found JSON-LD easier to understand, and implement than other formats like a turtle and RFD/XML. Also, after reading many documentations also, JSON-LD is still a better option to use programmatically because of the following reasons:
All the file formats are not exactly equivalent, and while all of these formats are used to represent data using Schema.org vocabulary and describe structured data, they have different syntax, structure, and characteristics. They are suitable for their specific use cases such as: JSON-LD is well-suited for use in web applications that need to exchange structured data over the internet, as it is easy to parse and generate using JavaScript, while Microdata and RDFa are more closely tied to HTML and are often used for adding additional information to the HTML code of a web page to provide more context and meaning to the content.
For the data which describes the Book type, properties are the ones that describe the data of the book which are below: | Property | Data Type | Description |
---|---|---|---|
bookEdition | Text | The edition of the book | |
bookFormat | BookFormatType | The format of the book | |
abridged | Boolean | Indicates whether the book is an abridged edition | |
illustrator | Person | The illustrator of the book | |
isbn | Text | The ISBN of the book | |
numberOfPages | Integer | The number of pages in the book |
From https://github.com/schemaorg/schemaorg/tree/main/data/releases/15.0
Examine the various formats in which the schema.org Types are described.
Which format would be best to work from programmatically? Are the various formats equivalent? What leads you to those conclusions?
Can you identify the data which describes the Book Type?