pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
210 stars 23 forks source link

How to parse strings into terms? #123

Closed KonradHoeffner closed 10 months ago

KonradHoeffner commented 1 year ago

I have a large amount of strings describing terms, for example ["<https://example.org/ExampleResource", "\"this is an example string\""]. Is there a predefined method I can use to convert those strings to Sophia terms of the appropriate kind (IRI and literal in this case)?

pchampin commented 1 year ago

No there is not. Arguably, this means of encoding RDF terms is common enough, so maybe it would make sense to provide such an implementation... PR welcome :)

Tpt commented 1 year ago

Quick note: it's already built-in into oxrdf in this file if someone wants to adapt it to Sophia.

KonradHoeffner commented 1 year ago

I'm using the following now, but it seems to be a bit slow and should probably only be used when the term is in object or unknown position or the performance doesn't matter, because for examples properties can't be literals so there will be unnecessary time spent on that case even though it cannot happen. Should I still add it as a pull request? And if yes, at which place?

fn auto_term(s: MownStr) -> io::Result<SimpleTerm> {
    match s.chars().next() {
        None => Err(Error::new(ErrorKind::InvalidData, "empty input")),
        Some('"') => match s.rfind('"') {
            None => Err(Error::new(
                ErrorKind::InvalidData,
                format!("missing right quotation mark in literal string {s}"),
            )),
            Some(index) => {
                let lex = &s[1..index];
                let rest = &s[index + 1..];
                // literal with no language tag and no datatype
                if rest.is_empty() {
                    return Ok(lex.into_term());
                }
                let lex = MownStr::from_str(lex);
                // either language tag or datatype
                if let Some(tag_index) = rest.find('@') {
                    let tag = LanguageTag::new_unchecked(MownStr::from_str(&rest[tag_index + 1..]));
                    return Ok(SimpleTerm::from_term(SimpleTerm::LiteralLanguage(lex, tag)));
                }
                // datatype
                let mut dt_split = rest.split("^^");
                dt_split.next(); // empty
                match dt_split.next() {
                    Some(dt) => {
                        let unquoted = &dt[1..dt.len() - 1];
                        let dt = IriRef::new_unchecked(MownStr::from_str(unquoted));
                        Ok(SimpleTerm::from_term(SimpleTerm::LiteralDatatype(lex, dt)))
                    }
                    None => Err(Error::new(ErrorKind::InvalidData, format!("empty datatype in {s}"))),
                }   
            }
        },
        Some('_') => Ok(BnodeId::new_unchecked(MownStr::from_str(&s[2..])).into_term()),
        _ => Ok(SimpleTerm::Iri(IriRef::new_unchecked(s))),
    }
}

P.S.: This uses the HDT convention, so URIs don't have "<" and ">" characters at the side, I'm not sure if that is useful for sophia.

pchampin commented 10 months ago

Sorry for the late response. This seems a little too HDT-specific to included it in the general Sophia crates. Are you ok to close this issue?

KonradHoeffner commented 10 months ago

Sure, no problem!