Closed GoogleCodeExporter closed 8 years ago
Design Summary
1 NON-JSON streaming out is only supported in batch mode.
2 Option "--format" is added to Jaql Shell. Three output formats
(json, csv and xml) are pre-defined. But general Jaql IO
descriptors are also supported.
Original comment by yaojingguo@gmail.com
on 22 Sep 2009 at 2:43
Another way to get this functionality is to support writing to stdout (e.g., a
print
functoin). To me, this approach sounds more attractive because it can reuse
everything we did for serialization. Also, when printing the output of a script
as
CSV, you probably don't want the output of all the statements, but only some of
them.
Using print would help here.
Original comment by Rainer.G...@gmx.de
on 22 Sep 2009 at 8:32
In either case, this feature should be implemented using the serialization
framework. It is basically like a global switch for the shell's output format.
I
think the intention here was to make it convenient for small, ad-hoc scripts
(probably one expression) to dump out their output as CSV (or some other
format).
Implementing this feature so that it is easily re-usable (e.g., write(stdout())
) is
a good idea.
Speaking of the serialization framework, we should unify it with the
StreamAdapters
(think of these as types of stream factories + formatting functionality). This
is a
topic for another item however...
http://code.google.com/p/jaql/issues/detail?id=47
Original comment by vuk.erce...@gmail.com
on 23 Sep 2009 at 2:01
Two Approaches
================
I describe the new approach in the following text.
- Add a function instead of providing output format option to
launch Jaql Shell.
- This gives us more flexibility to control the output format.
But there is a little problem. In current implementation,
Jaql Shell prints the evaluated value of an JSON expression.
If write(stdout("xml")) is called, two formats of data (JSON
and XML) will be sent to STDOUT. This also applies to
situations where stdout("xml") and stdout("csv") are used in
the same script. Do we really want this kind of flexibility?
- And since write function can take IO descriptor as parameter,
Jaql users can still use new format as IO descriptor other
than the pre-defined formats.
- Add the stdout function with minArgs = 0 and maxArgs =1. The
only parameter specifies the output format (json, csv and xml).
The default is json. write(stdout()) will write the content to
the STDOUT. We may need to tweak write function since it now
prints IO descriptor to STDOUT.
I prefer the original approach. But we can make stdout function
as a global switch for the shell's output format. For example, if
stdout("csv") is called, all the output will be in csv format.
Other
==============
- JsonOutputStream and JsonTextOutputStream should not be used.
These 2 classes also use the serialization framework.
JsonTextOutputStream uses JsonUtil which in turn uses the
serialization framework. JsonOutputStream uses
DefaultBinaryFullSerializer directly. To Rainer, could you
share me with some documents for the serialization framework if
you have? To Vuk, could you explain more about how to use
serialization framework directly?
- For CSV support, I want to reuse the mechanism in
ToDelConverter.
- For XML output format, I want to reuse some existing libraries.
I find that the following 2 libraries can provides JSON-To-XML
conversion functionality.
- http://www.json.org/java/index.html
- http://json-lib.sourceforge.net/usage.html
Do you have any suggestions with regard to these libraries? Do
you have any suggestions for the XML representation of JSON?
Original comment by yaojingguo@gmail.com
on 23 Sep 2009 at 3:56
write(stdout(...)) sounds good. The main advantage that I see is that you can
control
the output. This is important in the main use case of the CSV feature, that is,
piping Jaql results into other programs. Other opinions?
Original comment by Rainer.G...@gmx.de
on 23 Sep 2009 at 4:19
Yes, the stream converter classes use serializers. The issue raised is whether
or
not we should consider a different design where the stream converters are types
of
serializers instead of some other interface (converter)?
Regarding XML, we've taken a stab at a conversion from XML to JSON (see
com.ibm.jaql.lang.expr.xml.XmlToJsonFn). It would be useful if the writer was
consistent with this reader.
Original comment by vuk.erce...@gmail.com
on 23 Sep 2009 at 5:51
Yes, after I do more investigation of Jaql functions, I agree to
go with function approach. I have begun the implementation in
this way.
Original comment by yaojingguo@gmail.com
on 24 Sep 2009 at 3:28
The following 2 functions are added to support JSON streaming
output in XML and CSV formats.
- jsonToCsv is for JSON streaming output in CSV format
- jsonToXml is for JSON streaming output in XML format
Jaql users can use these 2 functions in both interactive mode and
batch mode. With these 2 functions, Jaql users have the full
control of JSON output.
Original comment by yaojingguo@gmail.com
on 28 Sep 2009 at 1:13
Fixed in Revision 397.
Summary of changes.
1. Function jsonToDel and jsonToXml are added.
2. json, del and xml registry entries are added to storage-default.jaql.
3. Add the following option to JAQL shell
-o (--outoptions) <outoptions> output options: json, del and xml or an
output IO descriptor. This option is
ignored when not running in batch mode.
2.1 In batch mode, JAQL shell prompt and echoing of input JAQL queries are
disabled.
If no output options are provided, JAQL shell print outputs in the same format
as in
interactive mode. If output options are provided. JAQL shell use file
descriptor to
print output. Any file descriptor can be specified.
jaqlshell -b data.json: prints the output in the same format as in interactive
mode.
jaqlshell -b -o json data.json: use json file descriptor to print output in
json format.
jaqlshell -b -o del data.json: use del file descriptor to print output in del
format.
jaqlshell -b -o "{type: 'del', outoptions: {fields: ['name', 'age']}}"
data.json: use
del file descriptor with field names specified.
jaqlshell -b -o xml data.json: use xml file descriptor to print output in xml
format.
jaqlshell -b -o "{type: 'local', location: 'abc'}" data.json: use the given file
descriptor top print output into a local file.
4. Rename the existing def function to defHdfs.
5. The output printing logic of Jaql has been extracted to JaqPrinter.
6. ToDelConverter has been moved from vendor directory to src/java directory.
Original comment by yaojingguo@gmail.com
on 16 Oct 2009 at 3:19
Original issue reported on code.google.com by
yaojingguo@gmail.com
on 22 Sep 2009 at 2:37