Closed wtlangford closed 10 years ago
@wtlangford I agree with all of that.
Note that 8780bc0 fixed #138, so that distinguishing libraries from programs is now (EDIT: now, not not) reliable and easy.
Hopefully that should read "now" and not "not". But yes, that seems good. I'm currently calling through to jq_parse_library()
for each one, as it seems to properly load things into a block and test that the code is, in fact, a library, before binding it.
Also worth considering is how we want -l
and -L
interacting with stackable options. We can't look to gcc for inspiration, as gcc doesn't stack its options. I don't think these should be allowed to stack.
More semantics for -l
and -L
: is the space required or optional? gcc makes it optional, at the cost of not allowing any other flags to start with l
. If the space is optional, special handling will be required for these options. If we make it required, they plug nicely into our current system.
Lastly, for -lmath
, should jq look for libmath.jq
or math.jq
?
@wtlangford wrote:
As for the order of the search paths, I think it should be something like this:
- Paths passed in with -L
- JQ_LIBRARY_PATH
- System locations (we don't currently use this, but might later.)
I would suggest that to avoid complexity (especially from a user's point of view), we avoid JQ_LIBRARY_PATH and system locations. Also, I'd like to continue plugging for a view of libraries and modules as resources that may be available in text files or via URLs, etc.
@pkoppstein The nice thing about JQ_LIBRARY_PATH and system locations is that if the user doesn't know anything about them, not much is likely to happen. JQ_LIBRARY_PATH won't exist and we just won't look there. System locations we don't even use right now, and in fact I'm not actually writing any code for them.
As far as resources via URLs, my only concern would be dependency creep, though URL-loaded resources could be handled by dlopen()
ing cURL if necessary or just autotools configure macros enabling/disabling cURL at configure/build time.
Not sure what you mean by text files. Could you clarify?
Looking at it now, it is actually fairly simple to support both -lmath
and -l math
. I think I'll just do this. It is likely expected anyways, since we're borrowing gcc's syntax for this.
@wtlangford asked:
Not sure what you mean by text files
I just meant ordinary files of the kind expected by the -f option. I was just anticipating that in future, "libraries" might be provided in other formats.
@wtlangford Yeah, "now", not "not".
As for -lfoo
vs -l foo
, it could be either, even both. Whatever is easiest for you, since you're doing this work. However, I do like the familiarity of -lfoo
. Obviously options with an argument can't stack, so that's not a concern.
Let's stay away from URIs for now but leave room for them: if the lib name/path starts with a '/' it's a path, else if there's a ':' in the lib name/path then it's a URI (and not supported for now), in all other cases it's a file searched for in the search path. In the module system each dependent will get to declare relative search paths -- I abhor *_PATH
env vars, so I want a reasonable way to avoid them altogether, but it's OK to have *_PATH
env vars given that. Since we have to build the library system first, then the module system, the env var nastiness gets in the door first; c'est la vie.
@wtlangford Yes, in the future I'd also like to have shared objects / DLLs to support the addition of C-coded builtins. First things first though.
Pure jq libraries should be in files whose names end in .jq
.
@nicowilliams Fair warning- once the *_PATH
variable exists, it is unpleasant trying to remove it. (People begin to depend on it.) If we really don't want it to exist, we should consider never having it.
@pkoppstein My focus on the library system developed from when I was working on adding the ability to accept multiple -f
files. I was expecting that all of them (or all but the last) acted like libraries, which makes sense, since we can only have one main expression anyways. It seemed that allowing -f
multiple times could give it different meanings in different contexts and that seemed...nasty. So, I pushed for the library system to handle multiple -f
instead.
@nicowilliams Is there an alternative to pure jq libraries? Or is that opposed to the future c-coded builtins?
Also, I'm currently using the following logic for identifying paths instead of names:
if libname starts with .
, /
, or ~
, it is a path.
Perhaps I should also check to see if it contains /
, in case someone uses relative paths...
@nicowilliams wrote:
if the lib name/path starts with a '/' it's a path, else if there's a ':' in the lib name/path then it's a URI
Can we refine that a little to conform more closely with the URI specification? A URI starts with "scheme" followed immediately by ":", so I'd suggest:
If the string begins with "scheme" followed by ":" then it's a URI, otherwise it's a path or relative path.
My understanding is that "scheme" is defined as [a-zA-Z][a-zA-Z0-9+.-]*
once the *_PATH variable exists, it is unpleasant trying to remove it.
Yes, in my view that's yet another reason to postpone supporting JQ_LIBRARY_PATH until it is shown (proven?) to be essential.
@nicowilliams @pkoppstein I've pushed a branch with library support to my fork. http://github.com/wtlangford/jq library-system
Take a look and play with it, tell me how you feel about it.
It currently supports JQ_LIBRARY_PATH
, which we may agree to just remove anyways. Of note from my testing is that the order of -l
arguments closely mimics the way it works with gcc. If A depends on B, you must write jq -lA -lB
, as jq -lB -lA
does not work.
@pkoppstein Sure, but in practice what I proposed works: just document the scheme thing and if anyone uses -lfoo/bar:baz
they'll get what they deserve some day.
@wtlangford My objection to *_PATH
is mostly that programs and libraries should declare the locations of their dependencies in a way that is relative to: the path of the program/library, to directories in the built-in search path / system locations. Given that I'm OK with having *_PATH for special cases.
The problem is that without a way to build relocatable, er, objects, people resort to writing wrapper scripts with enormous search paths stored in env vars -- that's not acceptable, and the way to avoid it is to have a way to do what ELF does with @ORIGIN
in rpath
s. But I'm willing to go one step at a time and hold off shipping 1.5 until we have both, the library system and the module system.
EDIT: Close italics.
@wtlangford Keep JQ_LIBRARY_PATH
for now. Let's make sure that we have syntax for declaring rpath
s later, and better: imports, exports, and namespace prefixes.
(Speaking of which, using '::' to separate module name from function name will be tricky, since ':' is parsed as a token by the lexer. Sure, we could make it work, it'll just be annoying :(
@wtlangford wrote:
I've pushed a branch with library support ...
Yay! My first little test worked. On a Mac. This obsoletes the "multiple -f" enhancement request, right?
@pkoppstein Yes, that one goes away.
I'm about to push some edge-case handling. Also, glad to hear it worked on a Mac. Sometimes stat()
isn't as portable as it should be.
Just pushed f57a14087f9adf7279a34fc5cba96a09376062a5, which has some fixes for when $HOME
isn't set, among other things.
The link to your branch (to make it easy to navigate to): https://github.com/wtlangford/jq/commits/library-system
@wtlangford BTW, when this is done I think I'll add support for builtin libraries, so that a raft of contributed utils that maybe shouldn't be in builtin.c today can be added to a -lutil
library. With builtin libraries we still get a standalone jq program.
@wtlangford FYI there's a bug in master regarding ~/.jq
:
$ cat > ~/.jq <<EOF
def f: "foo";
def g: "bar";
def fg: foo+bar;
EOF
$ ./jq -n fg
jq: locfile.c:37: locfile_get_line: Assertion `pos < l->length' failed.
$
This also affects your branch, though it's not your fault. I've a feeling that we need to refactor the locfile bits a bit so that the block
s reference the correct sources and we free the locfiles at the end.
Ah.., we assert while trying to complain that f/0
is unknown. But f is defined in the same file. Right, but that's not how builtin.c
handles each jq-coded builtin -- builtins_bind()
compiles each builtin separately, rather than the whole library of them at once. So there's several problems here...
I think the thing to do is to take the blocks output by jq_parse_library()
and merge them into the blocks output by jq_parse()
, and we need to extend the locfile
infrastructure so that a single object has all the source and location information for all the libraries and the main program. Then we cna stop parsing builtins one-by-one.
Perhaps not unsurprisingly, in my branch:
#cat > ~/.jq <<EOF_
def f: "foo";
def g: "bar";
def fg: f+g;
EOF_
$ ./jq -l~/.jq -n fg
works fine. This makes sense, given that the second time it parses fg/0
it actually finds f/0
and g/0
, since they showed up from the first time ~/.jq
was parsed.
But I agree that this needs to be reworked. If I had a better understanding of the compiler and friends, I might swing at it at bit.
As for the branch itself, do we see any issues? If not, I'll update the docs. Also, a note about naming conventions for jq libs. Should -lutil
look for util.jq
or libutil.jq
?
@wtlangford asked:
Should -lutil look for util.jq or libutil.jq?
These are (at least for now) just ordinary text files, so I think it would be confusing in several respects to adopt the "lib" convention, which really belongs to another universe, and maybe even another century.
In fact, I was thinking that before we go too far down this well-trodden path, it would be good to see how all this fits in with what I believe should be the primary mechanism for using jq libraries -- the "import" or "require" directive (in the manner of ruby or python). It would, at the minimum, allow for a chain of dependencies to be resolved automatically and efficiently. If "require" were properly supported, would there be any compelling need for the "-l" command-line option as currently envisioned?
@wtlangford
That only works by accident because you're loading ~/.jq
twice! Try ./jq -n fg
with the same ~/.jq
and watch it die.
@nicowilliams right. That was actually my point. The second time it parses ~/.jq
, it looks at the previous parsing to find the things it wouldn't find the first time (since it was in the same locfile). I was simply sharing an anecdote, there.
@wtlangford Sorry, I hadn't fully read your comment (shame on me).
Your branch looks fine to me now.
As for the bug here... I think there's two things to do:
Maybe: use block_bind()
instead of block_bind_referenced()
on the outputs of jq_parse_library()
and jq_parse()
, then block_free()
the library later ??
Well, anyways, I've not yet root-caused this problem.
struct locfile *
s: they should only be freed after all parsing is done because the blocks being bound here and there will retain references to the sources' locfiles
until the blocks are freed, which means: not until block_compile() finishes.Ah, yes, definitely use block_bind()
, because block_bind_referenced()
frees the binder as it goes, but we may need the same library for binding multiple libraries.
Aha, later when we add a way to express dependencies we'll need to find the dependency there, parse it if we haven't already, and bind it to the dependent, and we may need to do this many times, so we might as well keep all the loaded libraries' parsed forms around for binding to each other and to the main program.
Of course, that still doesn't explain why the fg
in the library didn't bind to the f
...
I think i just figured it out, but won't get to it until tomorrow. We
should apply block_bind_referenced()
to link libraries to main programs,
but we need to bind libraries to themselves too, which that function can't
quite do. What that function does is throw away the blocks for any
unreferenced defs, see...
@wtlangford FYI, I fixed that problem with 6b6e3f4 (see #479).
@wtlangford FYI, I've got a commit styling yours a bit in https://github.com/nicowilliams/jq/tree/libsys .
@nicowilliams I like it. Good catch on the Windows path separator. I hadn't yet tested it in a windows environment.
Shall I merge your style commit into my branch and do a bit more testing now that #479 is fixed?
Rather, after doing those things, do we feel this is ready for a PR?
Merge my commit, rebase and squash, play with it, and send me a PR.
If you like you could do two more things: a) update the docs, b) add a simple test to tests/run (just like I just did for #479). If you want to leave (b) for me, that's fine.
@wtlangford Question: supposing we could quickly turn around a module system, would we still want the -l
option?
I was actually considering that, myself. The module system -does- remove the need for a -l
option. I'm fine with just quickly turning around the module system.
@wtlangford Yeah, i agree. I can push a half-baked start at modules to a branch.
On Sunday, July 13, 2014, William Langford notifications@github.com wrote:
I was actually considering that, myself. The module system -does- remove the need for a -l option. I'm fine with just quickly turning around the module system.
— Reply to this email directly or view it on GitHub https://github.com/stedolan/jq/issues/472#issuecomment-48858014.
@nicowilliams Up to you. If you just want to push some changes to the parser to allow the import statements, I should be able to handle the rest.
@wtlangford See the libsys
branch of https://github.com/nicowilliams/jq i.e., https://github.com/nicowilliams/jq/tree/libsys .
It's missing: gen_import()
and the code to make it happen in jq_parse()
.
@wtlangford Did a bit more work on it; see my libsys
branch. Nothing yet running, just code. But I think it's getting closer. parser.y
in particular is in shape.
Note: we need to pass argv[0]
's origin to the jq compiler, so we can resolve $ORIGIN
references in the main program. I'm not sure what's the best way. The jq_compile() functions are getting to have a lot of arguments. I think I'd rather have a setter function to set the origin (and JQ_LIBRARY_PATH, and ...) on a `jq_state `. Extensibility this way is nicer IMO.
@nicowilliams I agree about adding that information to the jq_state
. More than 6 arguments is too many, and 6 is already a lot. Interestingly, I've been working on this without really looking at what you have and our branches are developing similarly. I've written a gen_import
and started on processing the dependencies. I'll get something together and push a branch for you to see as well.
On that note, do we want a finite list of valid attributes to set for the jq_state
or do we want to make it generic and back it with a jv_object
or something?
@wtlangford I'll back-off then, let you make progress. I got started on fixing bison warnings and that led me into this.
As to attributes on jq_state
, yeah, a jv_object
for all attributes whose values can be represented as jv
s sounds like an excellent idea.
Note BTW that at the lower level we need to pass explicit origins around: the origin might be a library's rather than the main program's.
I've got some ideas bouncing around for how to handle origins. On the topic of search paths, I'm thinking we leave -L
in place just to avoid something like
import lib1 search "/path/to/jq/libs";
import lib2 search "/path/to/jq/libs";
import lib3 search "/path/to/jq/libs";
...
All that repetition hurts to look at. Also if things get moved around, that's a lot of editing. Search order'd be something like
-L
pathsLD_LIBRARY_PATH
Thoughts?I agree, most import ... search ...
statements should only ever specify $ORIGIN/
-relative paths; all other libraries should be found relative to the jq executable's builtin path or $JQ_LIBRARY_PATH
or -L
. We need at least one of -L
or $JQ_LIBRARY_PATH
(did you really mean LD
, or was that a typo?); both might be OK, but since the env var can always be overridden with each invocation, I think then that -L
should add to $JQ_LIBRARY_PATH
.
Finally, the builtin path in the jq executable should be $ORIGIN/
-relative -- with $ORIGIN
being the jq executable's dirname(argv[0])
.
EDIT: s/type/typo/
Hopefully by having $ORIGIN/
support from the get-go and a built-in path with $ORIGIN/
(not just, or even instead of --libdir
) we can avoid the need for anyone to use $JQ_LIBRARY_PATH
. That's the idea anyways. That which ELF folks arrived at after many years, we can avoid learning the hard way.
One more possibility: remove both, -L
and $JQ_LIBRARY_PATH
and add them only once we know that $ORIGIN/
-relativity is not enough :)
I did mean JQ
, LD
was from muscle-memory (which, I guess, is kind of the point).
I think I'd like to have one of -L
and JQ_LIBRARY_PATH
there, just for simplicity's sake.
Especially if we're going to start bundling any libs with jq to keep them out of the builtins, we'll need a system path and a way to override that for non-standard installs. (./configure --prefix=
I'm looking at you.)
Similar to https://github.com/stedolan/jq/issues/112#issuecomment-31241424, simply looking to hammer out a spec of some sort for documentation's sake.
-l
command-line argument to specify a library-L
command-line argument to specify a library search pathJQ_LIBRARY_PATH
environment variable for library search paths.As for the semantics of
-l
, they are up for discussion. It has been discussed that something like-l/path/to/file.jq
should be treated as explicit paths and-lfile
should be treated as something to be searched for in the search paths. I agree in this respect.As for the order of the search paths, I think it should be something like this:
-L
Library files are expected to not contain a main expression, only function definitions. If one is found, then an error should be thrown and
jq
should exit.Thoughts?