openeduhub / metalookup

Provide metadata about domains w.r.t accessibility, licencing, adds, etc.
GNU General Public License v3.0
5 stars 0 forks source link

Align frontend data model with Metadata names of edusharing service. #94

Open MRuecklCC opened 2 years ago

MRuecklCC commented 2 years ago

Currently, the data model uses its own names for the different extractors. Eventually we want to align the names of the extractors to comply with the edusharing naming conventions?

MRuecklCC commented 2 years ago

As part of this it may also make sense to simplify the API input and output models. After some discussion with @RMeissnerCC we decided to:

MRuecklCC commented 2 years ago

The simplifications of the API data model was done as part of #100.

RobertMeissner commented 2 years ago

The simplifications of the API data model was done as part of #100.

Is there now anything left in this issue or can it be closed?

MRuecklCC commented 2 years ago

The main issue is still unresolved: https://issues.edu-sharing.net/jira/browse/KBMBF-475

MRuecklCC commented 2 years ago

To make some progress on this front, i spent a while going through the current meta data fields defined by the edusharing service and checking them out in elasticsearch. A couple of those fields are:

Misc attributes

Quality attributes

Available Extractors

On the other side, we have the current extractor implementations:

MRuecklCC commented 2 years ago

Mapping between extractors and meta data fields

As a first step, the following relations come to mind:

MRuecklCC commented 2 years ago

Given the example of the ccm:oeh_quality_protection_of_minors it also becomes clear, that the current response data model may be inadequat.

Consider the following two scenarios, where the service receives a request to extract meta information for a website that contains adult advertisement.

  1. The advertisement is detected with the EasylistAdult extractor which immediately makes clear, that the content is not suited as OER, the service could respond with a 0-Star rating for ccm:oeh_quality_protection_of_minors.
  2. The EasylistAdult extractor does not detect the ad (because it's not part of the respective blacklist). If the service responds with a 5-Star rating (because it didn't detect anything) that would be bad. A more conservative approach would be to omit the ccm:oeh_quality_protection_of_minors assessment (better safe than sorry).

Similar arguments can be made for other attributes. In those cases, the response data model for those cases could be either

In abstract terms:

In both cases we could refrain from responding with an assessment or at least wrap it into a "maybe"/"potentially"

RobertMeissner commented 2 years ago

Regarding your latest comment: so basically, there is no safe way of using black-/whitelists and make a solid statement. All we say is based on us relying on the lists to be "complete", whatever that means

MRuecklCC commented 2 years ago

I read about accessibility ratings and lighthouse

lummerland commented 2 years ago

Given the example of the ccm:oeh_quality_protection_of_minors it also becomes clear, that the current response data model may be inadequat.

It looks as we need to discuss and decide more ore less every field and mapping because of special characteristics. I think it would be helpful to have more detailed information on top of the "simple" mapping of fields. E.g.

MRuecklCC commented 2 years ago

As a first shot I will provide a new API endpoint providing the following 4 attributes:

The structure will follow what is available on the /extract endpoint, the mapping from Extractor to LRMI meta data field will be implemented in the most trivial way from the extractors listed above.

The endpoint will be POST {base-uri}/lrmi-suggestions. It will take a JSON will the following structure

{
    "url":"https://some-domain.de/path/to/content.html"
}

For now, the endpoint will only provide results for html content. Responses for non html content is unspecified for now. The response body will look as following:

{
"ccm:oeh_quality_protection_of_minors": {
    "stars": 0-5, # may be missing. In that case there will be an exception message
    "explanation": "some human readable string",
    "error": "" # will only be present if extraction failed, in which case neither stars, explanation or extra will be available.
    "extra": {
      # attribute specific extra information. the structure depends on the attribute.
    },
"ccm:oeh_quality_login": {
  # same as above
   },
"ccm:oeh_quality_data_privacy": {
  # same as above
   },
"ccm:accessibilitySummary": {
  # same as above
   }
}