Open MRuecklCC opened 2 years ago
As part of this it may also make sense to simplify the API input and output models. After some discussion with @RMeissnerCC we decided to:
The simplifications of the API data model was done as part of #100.
The simplifications of the API data model was done as part of #100.
Is there now anything left in this issue or can it be closed?
The main issue is still unresolved: https://issues.edu-sharing.net/jira/browse/KBMBF-475
To make some progress on this front, i spent a while going through the current meta data fields defined by the edusharing service and checking them out in elasticsearch. A couple of those fields are:
ccm:oeh_accessibility_security
: "IT-Sicherheit" (WIP)ccm:accessibilitySummary
: "Barrierefreiheit"
A,AA,AAA, BITV, WCAG
ccm:oeh_quality_personal_law
: "Pers\u00f6nlichkeitsrechte"ccm:oeh_quality_protection_of_minors
: "Jugendschutz"
ccm:oeh_quality_copyright_law
: "Urheberrecht"ccm:oeh_quality_criminal_law
: "Strafrecht"ccm:oeh_quality_login
: "Login notwendig"
ccm:oeh_quality_relevancy_for_education
: "geeignet f\u00fcr Bildung (WLO-Suche)"ccm:oeh_quality_transparentness
: "Anbieter Renommee"ccm:oeh_quality_didactics
: "Didaktik/Methodik"ccm:oeh_quality_medial
: "Medial passend"ccm:oeh_quality_language
: "Sprachlich"ccm:oeh_quality_neutralness
: "Neutralit\u00e4t"ccm:oeh_quality_currentness
: "Aktualit\u00e4t"ccm:oeh_quality_data_privacy
: "Datenschutz"
ccm:oeh_quality_correctness
: "Sachrichtigkeit"On the other side, we have the current extractor implementations:
Advertisement
EasyPrivacy
MaliciousExtensions
ExtractFromFiles
FanboyAnnoyance
FanboyNotification
FanboySocialMedia
AntiAdBlock
EasylistGermany
EasylistAdult
Paywalls
Security
IFrameEmbeddable
PopUp
RegWall
LogInOut
Cookies
GDPR
Javascript
Accessibility
LicenceExtractor
As a first step, the following relations come to mind:
ccm:oeh_quality_protection_of_minors
(Jugendschutz):
EasylistAdult
would need to be modified to the binary output schemaccm:oeh_quality_login
(Login notwendig)
LogInOut
, RegWall
, Paywalls
ccm:oeh_quality_data_privacy
(Datenschutz)
EasyPrivacy
, GDPR
, Cookies
ccm:accessibilitySummary
AccessibilityExtractors
output score to the A,AA,AAA
scale.Given the example of the ccm:oeh_quality_protection_of_minors
it also becomes clear, that the current response data model may be inadequat.
Consider the following two scenarios, where the service receives a request to extract meta information for a website that contains adult advertisement.
EasylistAdult
extractor which immediately makes clear, that the content is not suited as OER, the service could respond with a 0-Star rating for ccm:oeh_quality_protection_of_minors
.EasylistAdult
extractor does not detect the ad (because it's not part of the respective blacklist). If the service responds with a 5-Star rating (because it didn't detect anything) that would be bad. A more conservative approach would be to omit the ccm:oeh_quality_protection_of_minors
assessment (better safe than sorry).Similar arguments can be made for other attributes. In those cases, the response data model for those cases could be either
In abstract terms:
In both cases we could refrain from responding with an assessment or at least wrap it into a "maybe"/"potentially"
Regarding your latest comment: so basically, there is no safe way of using black-/whitelists and make a solid statement. All we say is based on us relying on the lists to be "complete", whatever that means
I read about accessibility ratings and lighthouse
Given the example of the
ccm:oeh_quality_protection_of_minors
it also becomes clear, that the current response data model may be inadequat.
It looks as we need to discuss and decide more ore less every field and mapping because of special characteristics. I think it would be helpful to have more detailed information on top of the "simple" mapping of fields. E.g.
As a first shot I will provide a new API endpoint providing the following 4 attributes:
ccm:oeh_quality_protection_of_minors
ccm:oeh_quality_login
ccm:oeh_quality_data_privacy
ccm:accessibilitySummary
The structure will follow what is available on the /extract
endpoint, the mapping from Extractor to LRMI meta data field will be implemented in the most trivial way from the extractors listed above.
The endpoint will be POST {base-uri}/lrmi-suggestions
. It will take a JSON will the following structure
{
"url":"https://some-domain.de/path/to/content.html"
}
For now, the endpoint will only provide results for html content. Responses for non html content is unspecified for now. The response body will look as following:
{
"ccm:oeh_quality_protection_of_minors": {
"stars": 0-5, # may be missing. In that case there will be an exception message
"explanation": "some human readable string",
"error": "" # will only be present if extraction failed, in which case neither stars, explanation or extra will be available.
"extra": {
# attribute specific extra information. the structure depends on the attribute.
},
"ccm:oeh_quality_login": {
# same as above
},
"ccm:oeh_quality_data_privacy": {
# same as above
},
"ccm:accessibilitySummary": {
# same as above
}
}
Currently, the data model uses its own names for the different extractors. Eventually we want to align the names of the extractors to comply with the edusharing naming conventions?