opensupplyhub / supplychaindata.exchange.schema

supplychaindata.exchange.schema
0 stars 0 forks source link

Do we need relationships in the core schema between location/organization? #3

Open wonderchook opened 3 weeks ago

wonderchook commented 3 weeks ago

Originally we were thinking about locations and then the ownership of those locations. My thinking has evolved to "organization" there are many more relationships beyond ownership. How should we show those relationships?

A few possible types of organizations:

fredsourcing commented 3 weeks ago

I think it definitely makes sense to have all locations as organization and then map them with relationship types.

A few relationship ideas:

vasgat commented 1 week ago

I agree that establishing relationships between the core objects is essential for creating a flexible schema and representing the dynamics of a supply chain ecosystem.

I am sharing here how we currently capture these dynamics on Wikirate in case it will help further the discussion on how we can develop the concept of relationships on this standard. As background, I need to mention that the data on Wikirate is organised under metrics and one type of metric we support is relationship metrics. Each relationship metric has also an inverse relationship metric which is populated automatically.

One of our most populated relationship metrics currently on the platform is Supplied by, where each company can be supplied by several facilities each year. The Inverse Relationship Metric is Supplier of which is automatically populated.

The schema to express the relationship between two entities looks like the following:

[{
      "id": 15630744,
      "name": "Commons+Supplied By+Hugo Boss AG+2022+Square Fashions LiTD.",
      "type": "Relationship Answer",
      "metric":"Commons+Supplied By",
      "subject_company": "Hugo Boss AG",
      "object_company": "Square Fashions LiTD.",
      "value": "Tier 1 Supplier",
      "year": 2022,
      "url": "https://wikirate.org/Commons+Supplied_By+Hugo_Boss_AG+2022+Square_Fashions_LiTD.json",
      "metric_id": 2929009,
      "inverse_metric_id": 2929015,
      "subject_company_id": 42714,
      "object_company_id": 6298030,

    },
    {
      "id": 15630749,
      "name": "Commons+Supplied By+Hugo Boss AG+2022+Navy Fashion Textile EOOD",
      "type": "Relationship Answer",
      "metric":"Commons+Supplied By",
      "subject_company": "Hugo Boss AG",
      "object_company": "Navy Fashion Textile EOOD",
      "value": "Tier 1 Supplier",
      "year": 2022,
      "url": "https://wikirate.org/Commons+Supplied_By+Hugo_Boss_AG+2022+Navy_Fashion_Textile_EOOD.json",
      "metric_id": 2929009,
      "inverse_metric_id": 2929015,
      "subject_company_id": 42714,
      "object_company_id": 13567589
    }]

Based on the discussions around relationships, I was wondering if it would make more sense to use JSON-LD to develop the standard since it can allow us to create these links between our currently core objects Location, Organization and I think it will provide an extra layer of interoperability since it will be compatible with the wider web of linked data. So, if I try quickly to express the relationships above using JSON-LD, I would do something like the following:

{
  "@context": {
    "id": "@id",
    "type": "@type",
    "name": "https://schema.org/name",
    "location-type": "https://schema.org/locationCategory",
    "address": "https://schema.org/address",
    "suppliedBy": "https://schema.org/supplier",
    "supplierOf": "https://schema.org/supplier",
    "organization": "https://schema.org/Organization",
    "place": "https://schema.org/Place",
    "inverse_metric": "https://schema.org/relatedTo",
    "url": "https://schema.org/url"
  },
  "@graph": [
    {
      "@type": "Organization",
      "id": "https://wikirate.org/Organization/Hugo_Boss_AG",
      "name": "Hugo Boss AG",
      "supplierBy": [
        {
          "@id": "https://wikirate.org/Location/Navy_Fashion_Textile_EOOD",
          "name": "Navy Fashion Textile EOOD"
        },
        {
          "@id": "https://wikirate.org/Location/Square_Fashions_LiTD",
          "name": "Square Fashions LiTD"
        }
      ]
    },
    {
      "@type": "Place",
      "id": "https://wikirate.org/Location/Navy_Fashion_Textile_EOOD",
      "name": "Navy Fashion Textile EOOD",
      "location-type": "Factory",
      "address": {
        "@type": "PostalAddress",
        "streetAddress": "123 Industry Road",
        "addressLocality": "Sofia",
        "addressRegion": "Sofia",
        "postalCode": "1000",
        "addressCountry": "BG"
      },
      "supplierOf": {
        "@id": "https://wikirate.org/Organization/Hugo_Boss_AG",
        "name": "Hugo Boss AG"
      }
    },
    {
      "@type": "Place",
      "id": "https://wikirate.org/Location/Square_Fashions_LiTD",
      "name": "Square Fashions LiTD",
      "location-type": "Factory",
      "address": {
        "@type": "PostalAddress",
        "streetAddress": "456 Garment Ave",
        "addressLocality": "Dhaka",
        "addressRegion": "Dhaka",
        "postalCode": "1205",
        "addressCountry": "BD"
      },
      "suppliedOf": {
        "@id": "https://wikirate.org/Organization/Hugo_Boss_AG",
        "name": "Hugo Boss AG"
      }
    }
  ]
}
wonderchook commented 1 week ago

thanks for that example @vasgat do you think it should be within the 2 core objects or are relationships another type of object?

I tend to think graph concepts are hard for a layperson, so I like to hide the complexity a bit.

vasgat commented 1 week ago

I am not sure what is the right approach for this. In a discussion, we had with Ethan, he proposed a separate object named Affiliation. In this case we can more flexibly define different types of relationships between entities.

{
    "subject_guid": "d9f233d8-e306-45e7-92e5-db604e4ad79a"
    "object_guid": "8ac0fe52-5194-48bd-8cc8-78cf7606e50d"
    "relationship": "supplied by"
    "status": "active"
}

The relationship here can be also renamed to affiliation-type.

fredsourcing commented 6 days ago

Just chiming in, I'm glad you raised this topic. From a database perspective, my original thought is that objects (or foreign keys) would be useful for most fields, following Kimball design principles.

That said I don't know if this is needed in the context of the schema. I imagine it would be possible to keep the schema simple for the sake of wider adoption, and then do the object matching on whatever system dealing with the data (processing). In other words: is the burden of "deduplicating" and selecting the right object on the user side or on the processor side?

Or it could be that the schema highlights all objects and connections, but an interface given to users simplifies it.

I'm not as close to the details when it comes to how the schema will be made available to the users so apologies if I'm off topic.

wonderchook commented 5 days ago

@fredsourcing when you say objects for most fields are you saying treat each field as an object? I was thinking the main core data objects would benefit from unique identifiers, but I wasn't thinking that we require it depending on the use case.

I was hoping to make the schema simple enough that we could create tools to easily translate between spreadsheets and other systems.

So for example I would think of location and organization as having linkages and the object linking them would be embedded in each of them instead of a table to look the keys up if the data was being modeled in a spreadsheet.

fredsourcing commented 5 days ago

@wonderchook yes, similar to the example posted by Vasgat (where we have metric_id, inverse_metric_id etc...) essentially each field would have a link to an object id from another table. For examples countries would be a separate table, with United Kingdom being id 123 etc... If the schema ends up using SIC codes or similar that would make sense too. I would argue that you can do that even for things like the address field. This standardises the data and makes querying faster when the data gets big.

BUT - this is for database design, this is not necessarily relevant to the schema (although it's good to think about it). I absolutely agree with you that the schema should be simple to understand and use. The most important is standardisation I believe.

I think maybe a way to achieve this standardisation on an open schema for data exchange might be to rely as much as possible on existing standard, like Vasiliki said on Slack. This would basically force user to use the given standard, which would be the virtual equivalent to having a separate table for each value. For example, with the country field, requiring Alpha-2 or Alpha-3 country code would ensure standardisation. This could be done for company category, the product category, perhaps for the language too.

For fields that would be difficult to standardise like the "processing-types" field for example (if this is kept) then another approach would be needed. I like your suggestion, that makes sense. I guess a list of possible options would be made available in the schema description for user to choose from?

To clarify: In the context of spreadsheets, all the data would be in clear text. For example, the country would be "GB", category using SIC code would be either "46420", processing type would be one of the option like "spinning" etc...

This is how I currently imagine it but I might be over complicating things :)

shuyag commented 4 days ago

@fredsourcing I'd agree with leveraging existing standards as much as possible, so this schema where ever there is a pre-existing standard that's open for use is pulled in, and where the category isn't as simple and broadly defined already as country (using alpha-2/alpha-3), we can end up referencing OSIDs or Global Field IDs. I wonder what standard(s) might already exist for "processing-types"

Coming back to the question of "is relationship type a field within location, or organization, or standalone", a use case that @ethn and I chatted through was what would happen if you had:

In this case, you might have the relationships:

So you'd need the option of having the affiliation/relationship type attachable to both location and organization (because there's location to location and, presumably, you could add in the ownership affiliation to any of the factories as well, so organization to organization would matter too). Would describing/appending this be cleaner if affiliation is a separate field, or as @wonderchook is saying in going the embedding route, would that add an extra step in having to do a look up?