Closed phillord closed 7 months ago
which version of sophia are you using?
Assuming you are using the latest release (0.5.3), I just pushed an experimental branch rio_xml
. You might want to try it, and replace xml::RdfXmlParser
by xml2::RdfXmlParser
in your code, see if that solves this issue -- and possibly #76 as well.
If it does, I will probably switch to this implementation as the default RDF/XML parser.
I'm trying to work my way through this. It seems to work and parse much quicker, but it's not a drop in replacement in my code.
Currently my main use for this just dumps graphs out into [Term; 3]
. So I do this:
let triple_iter = sophia::parser::xml::parse_bufread(bufread);
let triple_result: Result<Vec<_>, _> = triple_iter.collect();
let triple_v: Vec<[SpTerm; 3]> = triple_result.unwrap();
But I can't drop in replace this with xml2, and I haven't managed to work out how to get triples from the xml2::RdfXmlParser
. Apologies, I find the API rather confusing! I'd be grateful for any hints.
It seems to work and parse much quicker,
good
but it's not a drop in replacement in my code.
not quite, you are right...
Apologies, I find the API rather confusing! I'd be grateful for any hints.
I should be the one to apologize... I'm sorry you feel that way about the API, and I am open to any suggestion to make it easier.
Now about your problem:
This should work for you:
let triple_source = sophia::parser::xml2::parse_bufread(bufread);
let triple_result: Result<Vec<[BoxTerm;3]>, _> = triple_source.collect_triples();
let triple_v = triple_result.unwrap();
First, you need to understand that xml::parse_bufread(b)
is just a shortcut for xml::RdfXmlParser::default()::parse(b)
, where RdfXmlParser
implements the TripleParser
trait. So basically, xml::parse_bufread
is specified as the trait method TriplelParser::parse
(and this should be true of the parse_bufread
method of any parser module).
The contract of this method is to return a TripleSource
, which itself is a trait. This trait is implemented by any iterator of triples, but has other implementations. Each parser provides its own implementation of TripleSource
. xml
's happens to be an iterator, and your code above was relying on that. xml2
, on the other hand, has a different implementation (which contributes to making it faster, by the way :wink:).
Since sophia 0.5.0, TripleSource
provides a method similar to Iterator's collect
. It is called collect_triples
, and can build most implementations of Graph
(to be precise: it can build any implementation of CollectibleGraph
).
Vec<[BoxTerm;3]>
happens to implement Graph
and CollectibleGraph
.
I hope this helps.
FTR, there was an error in my previous comment; Vec<BoxTerm>
should have been Vec<[BoxTerm;3]>
. I just edited it to fix that.
I have it working now. It's taking me a while to test, because I think my code was dependent on behaviour from the old parser that was actually buggy.
Well, it seems to be working well. The two failures I were getting in my test suite were, I am sure, because of behaviour that was buggy in the old parser. It also fixes #76.
In terms of the API, I think the issue is partly mine. I still not find Rust entirely natural to use. Especially when implemented though traits, the documentation you need in the Rust doc can be several clicks away or deep in the page. Main thing that would help would be a bit more module documentation and especially examples!
I need to think more on sophia, because at the moment my own https://github.com/phillord/horned-owl duplicates some of the functionality. Too many options.
Well, it seems to be working well.
Great. I'll make the Rio parser the default in the next release. I'll close both issues then.
Main thing that would help would be a bit more module documentation and especially examples!
Yep, that's a pertaining item on my TODO list ;)
More documentation is on everyone's TODO list:-)
Do you have an ETA for a new release?
Do you have an ETA for a new release?
I'm hoping to do it by the end of June or beginning of July.
Okay, thanks for letting me know!
I'm hoping to do it by the end of June or beginning of July.
A little later than announced, but v0.6.0 is now out, with parser::xml
now based on Rio parser. Give it a try, and feel free to close this issue (and #76) if your problems are solved.
@phillord are you ok to close this issue? Since the pre-release patch "[seemed] to be working well", I am assuming that your problem is also solved with the current release.
@phillord up?
are you ok to close this issue? Since the pre-release patch "[seemed] to be working well", I am assuming that your problem is also solved with the current release.
closing: this issue is very old, and the current RDF/XML parser processes http://purl.obolibrary.org/obo/go.owl without any problem
I have been trying out the XML parser on a large file. Even after an elongated period, it fails to parse, where the turtle parser succeeds.
As my large file I have been using the Gene Ontology available at:
http://purl.obolibrary.org/obo/go.owl
(The ttl version I have had to convert from this using the OWL API; I can put it somewhere if it is helpful).
The ttl version runs in 7 seconds, the XML version, I do not know whether it is stalling or just slow, because I have not had it complete yet.