jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
29.57k stars 1.54k forks source link

Objective-C Bindings #444

Open wtlangford opened 10 years ago

wtlangford commented 10 years ago

It was discussed in #441 that some Objective-C bindings for iOS/OS X would be a nice thing to have.

For the bindings, are we looking for something relatively simple that abstracts away the jv types by replacing them with the platform's usual types (NSString, NSDictionary, NSArray, NSNumber, etc)? Or are we looking for something a bit more involved to abstract away the language itself (Along the lines of an ORM that hides)?

I'm more partial to the lighter-weight binding, but if anyone has a strong feeling otherwise (or for a different way to look at this entirely), I'd love to hear it.

@nicowilliams With respect to the Core Data thing we were discussing. As it turns out, yes, you can add additional data store types. Unfortunately, I don't think it will be particularly useful. Core Data is all about managed objects and models and object graphs. The idea is that the data gets serialized out to disk in some form or another (XML, binary, SQLite, some custom thing, etc...) but always gets loaded back into memory in these nice managed objects that handle write-backs, etc. These managed objects have fields/methods for changing the values, but there's no real way to stick a jq program anywhere in there to do anything, as jq's job isn't serialization as much as it is processing, to my understanding.

Thoughts?

nicowilliams commented 10 years ago

@wtlangford Yeah, for something like Core the only value would be to be able to extract an underlying jv value to then run a jq program on, or to create a wrapper for a jv resulting from a jq program.

As for other bindings, I tend to think that the lightest-weight possible bindings would be best. A jq machine class could take a run-this-jq-program method, but the jv functions maybe shouldn't be wrapped at all: it wouldn't simplify an Objective-C caller, would it? (After all, the caller could just call the C functions...)

wtlangford commented 10 years ago

I was considering a use case more along the lines of:

// Deserialize a json string into an object (of class NSString, NSNumber, NSDictionary, NSArray)
id json = [NSJSONSerialization JSONObjectWithData:jsonData options:0 error:nil];
JQState jq; // Set this up however.
[jq start:json flags:0];
id res;
while (res = [jq next]) {
  // res is an instance of one of those classes above, ready to be used by the rest of the Obj-C ecosystem.
}

I mean, I could also do something like:

id objcObject = [JQ objectForJsonValue:somejvstruct];

but I like this less.

And absolutely we could just call the C functions, and in fact this will likely still be available. But then we've got to unbox it from the jv types and put it into something that Obj-C finds more useful. So, lots of jv_copy() and switching on jv_get_kind(). The point of the bindings, to my understanding, was to avoid that, or at least make it unnecessary.

nicowilliams commented 10 years ago

OK, I'm sold :)

wtlangford commented 10 years ago

Other thoughts I'm toying with while working on this. Was there a reason that all jvs are struct jv and not struct jv*? It led to some very odd behavior when I was using pointers working on match.

ghost commented 10 years ago

Sorry to be that guy, but Objective-C bindings are not the same as Foundation/Cocoa/UIKit/whatever bindings.

Not that there's anything wrong with binding directly using Foundation types, but I would try to be clear on the wording.

wtlangford commented 10 years ago

@slapresta No, you're absolutely right. They are different and this merits discussion. I think I like the concept of binding against Foundation directly, than just making Objective-C bindings, but the more I think on it, the more I want to expose all of the jv functionality, which would be less cleanly done with Foundation types. Perhaps I should do Objective-C bindings (So, a JV class or JVString, JVArray, JVObject, JVNumber, etc) and add to it a method that returns the Foundation type for that value.

ghost commented 10 years ago

Well, when you think about it, NSNumber, NSString, NSNull, NSBoolean, NSArray and NSDictionary map directly to JSON objects, and everyone uses Foundation anyway. My quip was just with plainly calling it an Objective-C library if its bindings were to be Foundation-dependant. That said, I like the idea of binding against Foundation directly.

wtlangford commented 10 years ago

So, there may be a small issue with binding directly to Foundation types. NSBoolean does not exist. The official thing to do is to put them into a NSNumber. Which would be fine, except NSNumber discards the original type information when it stores the value. (Sort of. It is more accurate to say that retrieving the value discards the original type information and that there's no official way to get the original type back out. What actually happens is that NSNumber is a class cluster and the actual class instantiated is __NSCFBoolean. I'm querying the runtime about this presently, but this doesn't present a nice way for the user to tell the difference between a boolean and a number, as

#include <objc/runtime.h>
if ([value isKindOfClass:objc_getClass("__NSCFBoolean")])

is just gross.)

NSNumber provides the -boolValue method, but it just does C-style value-to-bool conversion, so it isn't useful for testing if a NSNumber contains what was originally a boolean.

I'm somewhat of the opinion that since we're dealing with structured data here anyways, this isn't an issue and you'd know which type to expect anyways. Of course, I may just build in functionality to switch between Foundation bindings and Objective-C bindings (JV wrapper classes).

Thoughts?

nicowilliams commented 10 years ago

@wtlangford I owe you an answer to this:

Other thoughts I'm toying with while working on this. Was there a reason that all jvs are struct jv and not struct jv*? It led to some very odd behavior when I was using pointers working on match.

I asked @stedolan a similar question, though mine was more about NaN coding. (NaN coding is a common technique in ECMAScript engines so that all values are C doubles in the engine, with signalling NaNs' signals as pointers for all non-number values...)

Icon too, BTW, did (does?) this, for similar reasons, I think (Icon calls its equivalent a value descriptor). Namely: we want to represent strings and arrays as (pointer, offset, length), or at least (pointer, length), so that sub-string/array operations are cheap: no heap allocation required. (In jq the pointer always points to a structure that starts with a reference count, and for arrays and strings includes the allocated length.) Well, the only way to do that is to use multi-word values, or resort to NaN coding schemes that involve a lot of trickiness (e.g., most NaN coding schemes can only support a 48-bit addressing scheme, and if you want to stuff offset/length information there... well you'll quickly get sad).

Using multi-word value representations allows passing them as function arguments, allocation as automatics (i.e., on the stack), it allows struct assignment, ... It's cheap, or at least cheaper than all the heap allocations and deallocations we'd have to use if 1jv`s were pointers.

Note too that jvs are immutable, and this is likely to be incompatible with many other APIs you might have to interface with, but you could use jv * to get around this: every data modifying operation simply becomes:

*v = jv_...(*v, ...);

and presto, immutability disappears as a source of impedance mismatch. Sure, now you have to allocate jvs on the heap in your bindings, but it's a fair trade considering you get to use jq programs (since that's what we're after with any language bindings). And you still get automatic memory management (for jq's bits anyways) and protection against cycles.

nicowilliams commented 10 years ago

@wtlangford As for NS not having booleans... if it supports NaNs (it supports doubles, but it's not clear if it allows signalling NaNs) then maybe you could NaN-code booleans, converting to/from jv number/boolean values in the bindings. This is much simpler than general NaN coding, so it should be feasible.

wtlangford commented 10 years ago

@nicowilliams while nan signalling would be nice internally so we don't have to depend on private implementations of nsnumber, it doesn't help with the user facing api. They'll still have to either check the signalling or query the runtime, unless I stick a helper on nsnumber, which may be the cleanest solution.

nicowilliams commented 10 years ago

@wtlangford Oh, ah, well, yeah. So accept the data loss?

nicowilliams commented 10 years ago

@wtlangford One could always use a jq program to convert numbers in the right places to booleans, and booleans always to integers.

Something like:

def num2bool: if type=="number" and .==0 then false elif type=="number" then true else empty end;
def num2bool(p):
    reduce path(p | select(type=="number")) as $p
        (.; setpath($p; getpath($p) | num2bool));

def bool2num: if . then 1 else 0 end;
def bool2num(p):
    reduce path(p | select(type=="boolean")) as $p
        (.; setpath($p; getpath($p) | bool2num));

I'm thinking it'd be nice to have something like:

def replace(pexp; filt; update):
    reduce path(pexp | filt) as $p (.; setpath($p; getpath($p) | update));

then

def bool2num(p): replace(p; select(type=="boolean"); bool2num);

It's a nice pattern.

EDIT: Actually, it's even simpler (pardon the braino):

def bool2num(p): (.. | select(type=="boolean")) |= bool2num;

That is, |= is a generalized replacer.

nicowilliams commented 10 years ago

@wtlangford Of course, this is "schema-based": you path expressions for where you expect booleans are -effectively- a partial schema representation. But it might do for your case!