kasei / attean

A Perl Semantic Web Framework
19 stars 10 forks source link

subjects, predicates, objects should return distinct terms #152

Closed VladimirAlexiev closed 3 years ago

VladimirAlexiev commented 4 years ago

https://metacpan.org/pod/release/GWILLIAMS/Attean-0.026/lib/Attean/API/Model.pm describes methods for returning the subjects, predicates, objects that match a certain pattern.

undef (no constraint) and array of terms (disjunction) are allowed in the pattern. In that case it's very possible that multiple triples will match the pattern; Attean returns the same result multiple times, but should return it only once.

Test case: I'm converting the SKOS ontology to something else.

for my $prop
  ($model->subjects
   (IRI("rdf:type"),
    [map IRI($_),
     qw(rdf:Property owl:AnnotationProperty owl:DatatypeProperty owl:ObjectProperty)])
   ->elements) {
     ...
}

our $MAP   = URI::NamespaceMap->new # fixed prefixes used in mapping
  ({so     => "http://www.ontotext.com/semantic-object/",
    dc     => "http://purl.org/dc/elements/1.1/",
    dct    => "http://purl.org/dc/terms/",
    owl    => "http://www.w3.org/2002/07/owl#",
    rdf    => "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    rdfs   => "http://www.w3.org/2000/01/rdf-schema#",
    schema => "http://schema.org/",
    skos   => "http://www.w3.org/2004/02/skos/core#",
    vann   => "http://purl.org/vocab/vann/preferredNamespaceUri",
    xsd    => "http://www.w3.org/2001/XMLSchema#",
   });

sub iri ($) {
  # convert string or URI (returned by URI::NamespaceMap $MAP) to Attean::IRI
  my $uri = shift;
  Attean::IRI->new (value => ref($uri) ? $uri->as_string : $uri, lazy => 1)
}

sub IRI ($) {
  # Return Attean::IRI from prefixed name resolved through $MAP.
  my $pname = shift;
  my $iri = iri($MAP->uri($pname));
  $iri
}

However, SKOS props are defined with two types, eg

skos:scopeNote
    a rdf:Property, owl:AnnotationProperty ;
skos:semanticRelation
    a rdf:Property, owl:ObjectProperty ;

I get each of them twice although I expect to get it once.

Workaround: List::MoreUtils qw(uniq)

kasei commented 4 years ago

I disagree that this API should return unique results. I think both approaches have useful applications.

I thought for sure I had made a uniq method public on iterators, but I can't seem to find it. Would having a uniq method available be a reasonable compromise?

$model->subjects($pred, \@objects)->uniq->elements
kjetilk commented 4 years ago

As an aside, @VladimirAlexiev , you shouldn't need to define your own iri and IRI functions. There's now an AtteanIRI type in Types::Attean. Conventionally, these types has a function to convert by prepending with to_.

So you should be able to do just:

use Types::Attean qw( to_AtteanIRI );

and then you should be able to use the to_AtteanIRI function for both these conversions and many more.

VladimirAlexiev commented 4 years ago

@kasei Agree with adding uniq to iterator.

kasei commented 3 years ago

@VladimirAlexiev sorry for the delay in responding. I'm going to release a new version soon with a uniq method on many iterator types, and include a note in the documentation for subjects, predicates, objects, and graphs indicating that their results are not necessarily unique.

$model->subjects($pred, \@objects)->elements->uniq would not be right, as elements materializes the iterator and returns a list. So uniq would have to come before elements.

kasei commented 3 years ago

Attean 0.028 (just released to CPAN) contains the discussed documentation and code changes.