SemAct registry? - Githubissues

nyarly commented 7 years ago

I notice in SemanticAction::satisfies? "FIXME: should have a registry"

I'm keen to see that, as I'd like to experiment with ShExMap. Did you have ideas about how such a registry should be implemented? For instance, does it make sense to have it be a field on the options hash, and inject it during parse, or better to have it be a class variable on ShEx? The latter as the default value for the former?

gkellogg commented 7 years ago

Discovery is done base on the URI, so parsing doesn't come into it. Perhaps something lie RDF::Vocabulary, with an sub-class using the URI of the plugin. We can then use an inherited hook to add it to the registry when the class is created. The issue is knowing what environment to provide to such plugins, as the spec is vague about this; probably the instance of the calling Operator.

Haven't looked at ShExMap.

nyarly commented 7 years ago

In terms of the registry object itself, my question comes down to, I think, is there a reasonable argument that a single global registry is sufficient? If different ShEx::Algebra::Schema will need their own registries, how should that be passed to the SemAct that needs it to do lookup?

The subclassed hook approach, which sounds good to me, sounds like there's no driver for anything but a global registry.

I definitely agree that SemActs are vaguely defined, as I review the spec about them. I'd assumed they should be understood by analogy to semantic actions in string parsing algorithms.

I tend to agree that the calling Operator should pass as complete a "satisfaction result" into the SemAct as possible; what of that context the SemAct uses would be up to the implementation. Again, by analogy to string parsing, where there is larger context involved with processing a semantic action, it's the semantic engine's responsibility to maintain that; the parser's responsibility is to provide enough information for the engine to do that processing. I'd despair at prognosticating about exactly what information that would be; an data structure designed to expand nicely as needed seems like it's called for.

I'm interested in pulling together a PR for this; I've already got a rights transfer on file for RDF.rf - does that transfer?

gkellogg commented 7 years ago

The only "registry" I believe we need is for Semantic Actions. I think if SemAct serves as a super-class, then different implementations (e.g., SemActTest) could be created using something like the following:

class SemActTest < ShEx::Algebra::SemAct("http://shex.io/extensions/Test/")
  def satisfied?(expression, matched, unmatched)
        str = if md = /^ *(fail|print) *\( *(?:(\"(?:[^\\"]|\\")*\")|([spo])) *\) *$/.match(operands[1].to_s)
          md[2] || case md[3]
          when 's' then matched.first.subject
          when 'p' then matched.first.predicate
          when 'o' then matched.first.object
          else          matched.first.to_sxp
          end.to_s
        else
          matched.empty? ? 'no statement' : matched.first.to_sxp
        end
        $stdout.puts str
        status str
        not_satisfied "fail" if md && md[1] == 'fail'
        true
  end
end

SemAct then implements something like:

  def inherited(subclass) # @private
    unless @@uri.nil?
      @@subclasses[@@uri] = subclass
    end
    super
  end

Then in SemAct#satisfies?:

def satisfies?
  subclass = @@subclasses.fetch(operands.first.to_s)
  subclass.new.satisfies?(*operands)
rescue KeyError => e
  status("unknown SemAct name #{operands.first}") {"expression: #{self.to_sxp}"}
  false
end

ShEx has one or two subclasses of SemAct built in (Test at least). Different gems can implement more subclasses, which when required by the application, automatically cause them to be registered and available for dispatch.

I think, is there a reasonable argument that a single global registry is sufficient? If different ShEx::Algebra::Schema will need their own registries, how should that be passed to the SemAct that needs it to do lookup?

Each SemAct needs to be uniquely identified by the URI operand. There shouldn't be overlap, but if there is, an application can choose which implementation to require.

I'm interested in pulling together a PR for this; I've already got a rights transfer on file for RDF.rf - does that transfer?

I'll make sure you have rights on this gem. You should add yourself as an AUTHOR, but recognize that this comes with responsibilities! Thanks for your help, contributions and contributors are more than welcome! After things settle down, we should create an issues list to manage the direction of the gem.

Right now, I'm reworking #execute and #satisfies, with #execute always returning (or raising) the marked-up operands to be able to walk back to determine why a shape was satisfied or not.
I'd like to get ShExJ implemented, both to read and write.
A SXP reader might be useful (should be trivial).
Heavy-wieght tracking of matched/unmatched satisfied/unsatisfied might be controlled using a :verbose option, so that in the simple operational case, the matching could be pretty light weight.
I'm not happy with the duplication with focus and map (in the gem or the spec). I think that focus is used for start, and the body should probably iterate through the map, so that a given shape could match different resources with one or more shapes in one call.

Please work on a GitFlow feature branch, and let's do PRs to get back into the develop branch. Once I've completed the work I'm doing right now, I'll do the same.

ruby-rdf / shex

SemAct registry? #2