kasei / attean

A Perl Semantic Web Framework
19 stars 10 forks source link

Attean::IRI not compatible with "URI" #151

Open VladimirAlexiev opened 4 years ago

VladimirAlexiev commented 4 years ago

I guess URI::NamespaceMap is the preferred way to make URIs from pnames of well-known namespaces, or namespaces harvested from an ingested RDF file. At least bin/attean_query uses that class.

However, URI::NamespaceMap returns URI which is not compatible with Attean::IRI. Eg if you try to use:

$model->subjects ($map->rdf->type, $map->owl->Ontology)

Attean returns an error like this:

Can't locate object method "does" via package "URI::http" at
C:/Strawberry/perl/site/lib/AtteanX/Store/Memory.pm line 112.

In URI::NamespaceMap::_scrub_uri() https://metacpan.org/release/URI-NamespaceMap/source/lib/URI/NamespaceMap.pm#L201 I see some code for compatibility with Trine, but not with Attean.

                        if ($uri->isa('URI::Namespace')) {
                                $uri = $uri->as_string;
                        }
                        elsif ($uri->isa('IRI')) {
                                $uri = $uri->as_string;
                        }
                        elsif ($uri->isa('URI')) {
                                # it's probably not necessary to do this, but whatever
                                $uri = $uri->as_string;
                        }
                        elsif ($uri->isa('RDF::Trine::Node')) {
                                # it is, on the other hand, necessary to do this.
                                $uri = $uri->uri_value;
                        }
                        elsif ($uri->isa('RDF::Trine::Namespace')) {
                                # and this
                                $uri = $uri->uri->uri_value;

@kasei or @kjetilk, could you take care of this?

kjetilk commented 4 years ago

Yes, that's right!

I have worried a bout about it too, but the way I've approached it is through coercions in the type system, which has recently entered Attean. There's the to_AtteanIRI function in Types::Attean. I've been using that for this purpose. I do agree though, it that it would be more elegant if it happened automatically, but I haven't had time to explore this further.

I'm not sure the code in URI/NamespaceMap.pm#L201 is relevant, IIRC, that is about turning an argument into a string, and in that case, Attean::IRI should be supported, because it isa IRI.

I'd love to see your use case

$model->subjects ($map->rdf->type, $map->owl->Ontology)

supported, because I do that all the time myself. I'm not sure how, but I tend to think it should be supported through the typing system and their coercions somehow. Would be happy to hear suggestions, I am certain open to change URI::NamespaceMap too, but I feel it shouldn't have a runtime-requirement on Attean or Trine, etc.

VladimirAlexiev commented 4 years ago
kjetilk commented 4 years ago
* is `IRI` compatible with `URI`?

No, and there's where the actual disconnect is, and where I think we need to see if we can fix things.

* another case is: after getting an iterator of `Attean:IRI` from a model, how to apply eg $map->abbreviate()`

You should be able to use $map->abbreviate(to_Namespace($attean_iri)), I think. I don't have tests for exactly that case, but I think it should work.

VladimirAlexiev commented 4 years ago

I use conversions like this:

sub iri ($) {
  # convert string or URI (returned by URI::NamespaceMap $MAP) to Attean::IRI
  my $uri = shift;
  Attean::IRI->new (value => ref($uri) ? $uri->as_string : $uri, lazy => 1)
}

sub uri ($) {
  my $iri = shift;
  URI->new (ref($iri) ? $iri->as_string : $iri);
}

@kjetilk said in #152:

shouldn't need to define your own iri and IRI functions. There's now an AtteanIRI type in Types::Attean. Conventionally, these types has a function to convert by prepending with to_. So you should be able to do just:

use Types::Attean qw( to_AtteanIRI );

and then you should be able to use the to_AtteanIRI function for both these conversions and many more.

But:

kjetilk commented 4 years ago

Does to_AtteanIRI take either string or can('as_string') argument?

Yes, it can convert from a string too (not sure I understood the last part of the question).

Note that I use lazy => 1 to speed up the conversion, because I don't need the IRI parsed into components. Can to_AtteanIRI do this?

No, but I consider that the natural next step: https://github.com/kasei/perl-iri/issues/14

I think the main tension is between the IRI class and URI, because Attean::IRI is a subclass of IRI, and so a solution to this problem is likely to be more appropriate to be in IRI. I think, perhaps @kasei can comment on that.

Note that lazy => 1 was introduces as a response to a performance problem I had with URI parsing, there is a lot of it going on for certain applications, so I think it is worthwhile looking into this problem.

I did some experiments, and @kasei did actually merge them, so there is code in IRI that could be helpful: https://github.com/kasei/perl-iri/pull/16

I ran out of time to explore this in the depth it needs. Not only do we need to pass the components back and forth, we also need to test it properly and benchmark it, but I haven't got the time.

Do I still need to define my function uri() to convert IRI->URI?

No, I don't think so.

Is there something like lazy => 1 in URI->new? Couldn't find any.

No, but I exploited that you can pass the components to URI, so my idea is that we can use the methods to set them in both ends. It isn't given that it gives a performance boost, as that results in many subroutine calls, but it is interesting to see if it does.

VladimirAlexiev commented 4 years ago

@kjetilk :

Does to_AtteanIRI take can('as_string') argument?

I mean whether it can take a URI argument, and use ->as_string

my function uri() to convert IRI->URI? No, I don't think so.

If I use a stock to_AtteanIRI() but also need to convert IRI->URI (in order to put it in a NamespaceMap), I still need to define my own function for that?

you can pass the components to URI, so my idea is that we can use the methods to set them in both ends

I understand. But in some applications (eg semweb) you don't need to parse IRIs/URIs into components at all, so a lazy option saves that effort.

kasei commented 4 years ago
  • Note that I use lazy => 1 to speed up the conversion, because I don't need the IRI parsed into components. Can to_AtteanIRI do this?

Note that lazy only defers the parsing of components. It doesn't avoid it altogether.

kjetilk commented 4 years ago

@kjetilk :

Does to_AtteanIRI take can('as_string') argument?

I mean whether it can take a URI argument, and use ->as_string

I'm still not sure I understand, could you provide a complete code example?

my function uri() to convert IRI->URI? No, I don't think so.

If I use a stock to_AtteanIRI() but also need to convert IRI->URI (in order to put it in a NamespaceMap), I still need to define my own function for that?

No, there's a coercion for that in Types::Namespace, so you should be able to use to_Namespace for that.

All these use ->as_string for the conversion, so if that's where the performance problem is, it won't help, but I think it would be where we should be solving the problem.

you can pass the components to URI, so my idea is that we can use the methods to set them in both ends

I understand. But in some applications (eg semweb) you don't need to parse IRIs/URIs into components at all, so a lazy option saves that effort.

It isn't always that unproblematic, as a pure string comparison might not be sufficient. RFC3986 has a section on comparison and often these issues jump out to bite us in semweb applications. Often, you need to parse and normalize the URI at some point in the process. It shouldn't be when you most need performance, but it requires an elaborate design at times.