Closed pkoppstein closed 10 years ago
The need is much less severe than you think because each def replaces previous ones for the purposes of binding subsequent defs. Therefore the addition of new defs to builtin.c does not cause backwards compatibility issues.
Proof:
$ echo $SHELL
...
$ jq -n env.$SHELL
<same as above>
$ echo 'def env: {"SHELL":"ha! builtin env overridden as expected"};' >> ~/.jq
$ jq -n -r env.SHELL
ha! builtin env overridden as expected
$
:)
@nicowilliams -- Yes, I was aware of the overwriting feature, and yes, I realize that if a user had a library file that defined env
, then that user's old programs need not change, but if the user now wants to use the new builtin env
in a jq program that requires the library, he or she will have to change the library. Whether you call that a "backwards compatibility issue" or not, it is an issue -- I would say a major issue with respect to software with a version number greater than or equal to 1.0.
It's a minor issue, really. But we agree that there should be a bit more syntax for including libraries. Right now all we have is ~/.jq
, and that's a bit lame -- it's not remotely the right approach for production applications, for example. What the result ends up looking like is still up in the air, and I won't tackle it until the I/O and other higher-priority work is done. But if you send me a PR then I'd have to look at it sooner than later :)
See also #112.
@wtlangford I'm thinking we need a pseudo-opcode by which the parser can encode the desire to import some library. Then in jq_compile_libs_args()
(and jq_compile_args()
) we'd extract these imports, find and parse the given library, rename defs in the parsed result, then block_bind_referenced()
the result to the body of whatever wanted the import (which might be a library).
In compile()
we'd then ignore this new pseudo-code.
There's more details (e.g., private vs. public symbols). But I think that's a decent sketch.
@nicowilliams Sounds fine to me, but what do you mean by rename defs?
@wtlangford I mean adding a prefix, for namespace management purposes,
import foo as bar
.
@wtlangford This change allows for the use of '::' in identifiers:
diff --git a/lexer.l b/lexer.l
index b51ab1f..3a74ef4 100644
--- a/lexer.l
+++ b/lexer.l
@@ -114,6 +114,7 @@ struct lexer_param;
[a-zA-Z_][a-zA-Z_0-9]* { yylval->literal = jv_string(yytext); return IDENT;}
+[a-zA-Z_][a-zA-Z_0-9]*::[a-zA-Z_][a-zA-Z_0-9]* { yylval->literal = jv_string(yytext); return IDENT;}
\.[a-zA-Z_][a-zA-Z_0-9]* { yylval->literal = jv_string(yytext+1); return FIELD;}
[ \n\t]+ {}
A relatively simple change to parser.y will parse "import" declarations. Then we need to code up a function to generate the block representation of defs, and modify jq_parse*()
to check for declared imports, load each library, rename its symbols as appropriate, then block_bind_referenced()
each loaded dependency to the result of the parse of the program/library.
Something like that.
I think this syntax will do:
import "foo";
import "foo" as "bar";
import "foo" search "@ORIGIN/../lib";
import "foo" as "bar" search "@ORIGIN/../lib";
The symbols of the library to be imported would be prefixed with "%s::"
, where the string is either the name of the library or the alias given in the import.
This looks pretty easy to pull off with the library system machinery in place.
This seems fine to me. At first glance, the only issue I see is circular dependencies. The nice thing about the -l
library system is that circular dependencies actually work fine, you just need a -llib1 -llib2 -llib1
. With this module system, we'll have to keep track of what modules/libraries have already been loaded.
We should hammer out some semantics/rules for the search path portion of this, though.
I want ELF linker-style $ORIGIN
semantics for sure. The alternative is
to not have relocatable modules and programs.
EDIT: $ORIGIN
, not @ORIGIN
.
Regarding circularity, there are two problems: infinite recursion, and code duplication. Since jq programs and libraries have no global state, code duplication is merely suboptimal, and I can live with that for now. The minimum we must do is notice circular deoendencies, and maybe not even: we can always document that they are not supported,
@wtlangford To expand on $ORIGIN
, any search path element in an import statement that starts with $ORIGIN/
should have "$ORIGIN/" replaced with the path to the directory containing the library where the import statement was found. So if a library is found in /opt/foo/bar/lib/foo.jq
and the search path contains $ORIGIN/
then the path to be searched will be /opt/foo/bar/lib/
.
This allows for relocation: if you relocate this package to /opt/foobar/
so that the lib path were now /opt/foobar/lib
, the search for foo.jq
's dependencies from the same package will still succeed.
Declared search paths should be searched before the system or JQ_LIBRARY_PATH
directories. If we have this from day one then perhaps JQ_LIBRARY_PATH
will not be abused. JQ_LIBRARY_PATH
should only be used for running a jq executable outside its originally-intended install location.
One question is whether the import
statement should allow the specification of version constraints, or whether such constraints belong (exclusively) in the module metadata (JSON_OBJECT).
Another question is whether every module must give its version number. If so, then presumably JSON_OBJECT would be required, which is undesirable. Thus, the package management system will have to be able to manage unversioned modules.
ASSUMING that all the metadata about versions is going to be be placed in JSON_OBJECT, I would propose the following specification:
JSON_OBJECT is the repository for the module's metadata.
If JSON_OBJECT is given, then the following keys have special significance,
and if given should have values as specified here:
"version": SEMANTIC_VERSION
"requires": ARRAY OF {"module": STRING, "version": SEMANTIC_VERSION_RANGE}
where:
SEMANTIC_VERSION is a string following the semantic versioning scheme;
SEMANTIC_VERSION_RANGE is either a SEMANTIC_VERSION (the minimum acceptable
version) or a string consisting of two tokens that together specify a range of acceptable
versions (see http://julia.readthedocs.org/en/latest/manual/packages/#requirements)
Example:
{"version": "1.2.3", "requires": [ { "module": "Statistics", "version": "0.1 0.2-"} ] }
[The above has been edited to indicate that JSON_OBJECT is optional, and that its special keys are also optional.]
One step at a time. The only urgent decisions are: a) must modules declare a version, b) must dependents declare a minimum version for each dependency.
I'm inclined to integrate @wtlangford code as-is and revisit versioning later.
I'm also inclined to make versioning optional: jq is a friendly language with very little ceremony. Versioning is a best practice; making it required is not required.
But we do need versioning. One problem that comes up is: how to represent versions. We only need jq to enforce a minimum, and major version boundaries too.
My preferred version representation would be: as a number, with the integer portion representing a major version number and the fraction representing a minor version number. But the major number could just be made part of the module name, which makes sense if it represents a backwards incompatible change vis-a-vis the previous major version. Micro versions shouldn't be numbers, but numbers or strings (e.g., hash values, git commit hashes, ...).
@nicowilliams wrote:
jq is a friendly language with very little ceremony.
Agreed, so I've revised the description of JSON_OBJECT to make everything optional, but I believe that in the interests of simplicity for the jq user, "registered packages" would be required to provide this kind of information.
The question remains, however, whether we want the "required version" information to be part of the metadata (JSON_OBJECT) or part of the import statement. It seems to me there are pros and cons either way.
Another question which I don't think has been addressed yet is whether (in the interests of minimal ceremony for jq users) the JSON_OBJECT should also be the locus of any information that may be required to ensure dependencies can be located without user intervention. The goal I have in mind is that the jq user should be able to add (using Pkg::add/1) and then import any registered module without having to specify anything about where that module or its dependencies are located.
@pkoppstein For trusted (i.e., locally-found) modules there's nothing wrong with using $ORIGIN
-based search paths from the module. For modules downloaded from the 'Net... well, we'll figure that out when we get there (ideally we'd have named repos and modules would be searched for in selected repos; no URIs in sight, but URNs yes, and if you want to use modules not in any repo then you'll have to make a local directory of said modules).
Adding syntax is always possible, and clearly we'll have to when we add versioning. I'll look at that as soon as we're done with the main part of the module system. If the metadata we need for the linkerloader is quite limited then I don't mind, and maybe prefer, having it not be an object (which we can always add later). At the moment I'm thinking that the only thing we really need version-wise is a minor version number (refreshing simplicity!).
BTW, I don't relish the thought of adding the bloat of HTTP and TLS libraries to jq just to have a pkg system builtin. I realize that it would be oh so convenient. I'm tempted instead to rely on spawning a curl(1) process. I want to draw the line at regexp (which we now have, thanks to @wtlangford!) and maybe rudimentary Unicode support (Ongiguruma has some, but it's not exported, and it lacks normalization code). After that no new external dependencies; everything else in modules. We'll need a C-coded module system using dlopen()
, LoadLibraryEx()
(more on that some other day).
@nicowilliams wrote:
I'm tempted instead to rely on spawning a curl(1) process.
Great minds! This is the code from Julia:
function curl(url::String, opts::Cmd=``)
success(`curl --version`) || error("using the GitHub API requires having `curl` installed")
out, proc = open(`curl -i -s -S $opts $url`,"r")
head = readline(out)
status = int(split(head,r"\s+",3)[2])
for line in eachline(out)
ismatch(r"^\s*$",line) || continue
wait(proc); return status, readall(out)
end
error("strangely formatted HTTP response")
end
(Specifically: base/pkg/github.jl)
I mention this for several reasons beyond the obvious. First, I hope you'll take the time to become more familiar with Julia -- it represents the combined effort of some great 21st century minds. Second, much of Julia is written in Julia, and I expect that with a few more primitives (notably system
), jq's package manager could also be written primarily in jq. Third, Julia is MIT-licensed, so with the right incantations, we should be able to borrow freely.
We now have import
(still gotta document it). I'm thinking of adding syntax allowing modules to start with a module declaration:
module NAME version NUMBER;
Later we would add a metadata object option. "Later" because I have no use for such metadata now, but will eventually. The object would store arbitrary constant metadata. Might as well add a const def sort of thing as well: in the jq language what appear to be JSON object/array value literals are really code for constructing them, since they needn't be constant literals, but a constant literal could be useful. Also potentially useful would be a data load directive (imagine writing Unicode handling code in jq, thus needing to load large-but-constant Unicode tables).
@nicowilliams wrote:
module NAME version NUMBER;
Excellent, but in accordance with your previous observations about friendliness, I assume you mean:
module NAME [version VERSION];
Also, as a supporter of semantic versioning, and in the spirit of "convention over configuration", I'd recommend that VERSION be required to conform to the semver syntax. It will simplify things down the road.
If I understand http://semver.org/ correctly, the syntax of a valid semantic version number can be summarized as follow:
VERSION == NORMAL or VARIANT
NORMAL == NUMBER "." NUMBER "." NUMBER
NUMBER == 0 or [1-9][0-9]*
VARIANT == NORMAL "-" IDS
IDS == ID or ID "." IDS
ID == [A-Za-z1-9][A-Za-z0-9-]*
Examples: 1.2.3 1.2.3-alpha 1.0.0-0.3.7 1.0.0-x.7.z.92
However, I could see allowing NUMBER and NUMBER "." NUMBER as well.
@nicowilliams wrote:
in the jq language what appear to be JSON object/array value literals are really code for constructing them, since they needn't be constant literals
This intrigues me, as saying "ABC"
in jq actually creates a function (named @lambda
) that adds ""
and "ABC"
and returns it. Was there a reason for this behavior? I imagine it has something to do with backtracking and the creation of closures, but I cannot for the life of me figure out what.
@wtlangford I have noticed this too. This has to do with the way string interpolation/formatting works. It should be possible to optimize this away in gen_binop()
though. Might as well add some compiler constant folding functionality while I'm at it.
Since jq's version number is greater than 1, jq urgently needs some kind of module system that will help avoid naming collisions. The recent addition of the
env
filter highlights the need. Had jq modules been available, such a system-dependent function could well have gone in a jq-provided "System" module, or users could have protected their own function namedenv
by putting it in a module of their own.PRIMARY GOALS
[1] Provide a mechanism for avoiding namespace collisions.
This includes the ability to avoid namespace collisions not only of jq functions but also of named collections of functions (modules( themselves.
[2] Support the packaging of related functions.
There has been some discussion about "libraries" or "packages" of functions, e.g. for Unicode support. Providing these libraries as modules would facilitate their description, versioning, dependency management, etc.
[3] Support the definition of evaluation contexts.
Proposed additions to jq such as
eval
would benefit from module support, e.g. jq 'eval(STRING, MODULE)' would very roughly be like:jq -f <(cat MODULE_CONTENTS ; echo STRING)
That is, jq would compile STRING in the context of MODULE, and then filter the input accordingly.
PROPOSED SYNTAX
Summary
IMPORT::
MODULE::
This proposal introduces two new reserved words:
However, if for some reason "as" cannot be used as a keyword here, then "alias" would be recommended.
Invocation of a function defined in a module
Invoking a function, f, defined in a module, M (or in a module aliased as M):
Rationale: i) Using . or even ":" as the separator raises too many issues. (*)
ii) Commandeering a special character such as & to serve as a prefix sigil (as in &M.f) would be wasteful and probably more confusing than helpful.
Module Definition
where:
JSON_OBJECT if specified is a JSON object that can be used for giving details about the module (version, author, etc);
IMPORT ::= import RESOURCE [as ALIAS];
DEFINITION is a jq function definition;
RESOURCE is a JSON entity specifying a file or URL; for example, the string "http://modules.jq.org/unicode.jq"; the referenced file or URL should be a valid jq program.
The "import" directive allows one module to "include" another. The function definitions so included become available both within MODULENAME and wherever MODULENAME functions are available. In all cases, however, the MODULE::FUNCTION syntax must be used. There is no nesting of modules.
Rationale: The proposed syntax for module definitions allows existing function definitions to be "copy-and-pasted" into a module, and yet is flexible enough to support other functionality. The 'end' keyword already exists and is appropriate since each IMPORT and each DEFINITION is terminated with a semicolon.
Example:
module MyModule def id(x): x; end
Loading a Module
For the initial implementation, it would be sufficient simply to allow the '-f FILESPEC' option to be specified multiple times. For example:
would be equivalent to jq -f <(cat MYMODULE.jq MYPROGRAM.jq)
The major enhancement would be to support the "import" directive generally, i.e.
This allows a module to be imported as though it were named differently.
Module Description
The JSON object included in the definition of one module can be used to include a description of the module, its version number, etc.
Relative Paths
The I/O enhancemnts for jq that are underway may obviate the need for additional options, but if not, one possibility would be for jq to support the concept of module paths. These could, for example, be specified using a "--path" option.
Proposed Simplifications
Additional Features
Needless to say, there are many other possible enhancements beyond the above skeletal proposal, but hopefully whichever enhancements are adopted can be built on the foundations of the basic module system described above.
Footnote:
(*) If M is both a user-defined module and a user-defined function, then M.f would at best be ambiguous; at worst one would shadow the other, defeating one of the goals for having a module system in the first place.
As for using ":" as the module/function separator, it has been observed that expressions such as {"a": M:a} may be more difficult to read than
{"a": M::a}
.