ropensci / emld

:package: JSON-LD representation of EML
https://docs.ropensci.org/emld
Other
13 stars 6 forks source link

added raw method to as_emld #20

Closed isteves closed 6 years ago

isteves commented 6 years ago

Related to #19

I couldn't get the char format to work, so I omitted it for now (though it's commented in the test script).

cboettig commented 6 years ago

@isteves this looks nice. My only hesitancy is that we probably shouldn't assume all raw vectors are XML; e.g. one could have JSON serialized as raw bytes as well. Is there a way (extension? mime type? magic number?) to detect that the raw vector is XML first?

This also highlights the fact that we don't have an as_emld.character() method that works with literal input, so far "character" has to be a filename (or something we can coerce into a list, e.g. plain element name). If we could take literals, than raw could convert to character first, though that would probably defeat much of whatever efficiency gain you have in the binary format.

isteves commented 6 years ago

Hmm yeah, the only alternative I can think of right now is to add an type argument to specify what's being read in since the bytes themselves don't have any type information.

I checked to see if httr::content has a smart way of handling this, but it looks like it preferentially uses (1) an explicitly specified type, (2) the header of the httr object (which we don't have), and (3) the file extension if given in the url (which we also don't have). ...which means we're left with just specifying explicitly or, as you suggested, reading it in as character first.

type <- type %||% x$headers[["Content-Type"]] %||% mime::guess_type(x$url, empty = "application/octet-stream")

isteves commented 6 years ago

Actually, thinking about it some more... are the headers for xml/json always the same? if yes, then the initial bytes are presumably consistent and we can use that to do some matching.

cboettig commented 6 years ago

fwiw I filed a bug in #21 on this issue which I think we should resolve first. Once as_emld() takes an argument for input string (i.e. xml, json, or list), then it should be easier to add the raw vector inputs as options for both xml and json without things getting ambiguous. Thanks for raising this issue!