Open jglick opened 11 years ago
I'll have a crack at it! This will also make @ctrueden happy.
JSON possibilities in Central: net.minidev:json-smart
; com.eclipsesource.minimal-json:minimal-json
; com.googlecode.json-simple:json-simple
. https://github.com/mmastrac/nanojson also looks promising though apparently not in Central.
Careful with primitive types/wrappers, though; JSON has no equivalent to Character
, and does not accurately distinguish between numeric types.
On reflection, XML would probably do fine, and avoid the need for any new dependency. Main concern is handling of XML-reserved characters: not just <
and &
and "
but U+0000–U+001F and some others (which are unlikely but legal parts of attribute values). I bet a SAX parser with no validation or namespace support would be sufficiently fast for this purpose (LazyIndexIterator.peek
); can check in perftest
module.
Suggested format for META-INF/annotations/$name.xml
(with unnecessary whitespace stripped by default, perhaps overridable via processor option):
<?xml version="1.0" encoding="UTF-8"?>
<r> <!-- need some wrapper element -->
<e c="the.Clazz"> <!-- SerAnnotatedElement -->
<v n="attributeName"> <!-- one value of annotation -->
<s>text</s> <!-- String value -->
</v>
<v n="arrayValued"><l><s>first</s><s>second</s></l></v>
<v n="annotationValued">
<a n="the.Annotation">
<v n="itsOwnOptionalValue"><s>…</s></v>
</a>
</v>
<v n="classValued"><c n="the.ClassValue"/></v>
<v n="enumValued"><n e="the.EnumType" c="CONSTANT"/></v>
<v n="byteValued"><B>-13</B></v>
<!-- or could use numeric value, but less readable: -->
<v n="charValued"><C>@</C></v>
<v n="doubleValued"><D>123.45E15</D></v>
<v n="floatValued"><F>-0.01</F></v>
<v n="intValued"><I>17</I></v>
<v n="longValued"><J>1234567890123</J></v>
<v n="shortValued"><S>55</S></v>
<v n="booleanValued"><Z>true</Z></v>
</e>
<!-- empty element if no values: -->
<e c="other.Clazz" m="methodName"/>
<e c="other.Clazz" f="fieldName"/>
</r>
Typical contents for e.g. META-INF/annotations/hudson.Extension.xml
would be reasonably compact:
<?xml version="1.0" encoding="UTF-8"?><r><e c="my.plugin.Extension1"/><e c="my.plugin.Extension2"/></r>
Actually could use XMLStreamReader
since we depend on Java 6 now anyway.
I did a completely home-grown JSON-like thing ;-)
Is there any status on this issue? This is a blocker issue for me because the binary serialized output format means making uberjars containing multiple jars with sezpoz annotations is not possible. The standard Maven Shade plugins for resource combining (appending, xml-appending, http://maven.apache.org/plugins-archives/maven-shade-plugin-1.7.1/examples/resource-transformers.html) do not work with the annoations produced by sezpoz
@kmader well, we switched away from Sezpoz and implemented our own annotation processor. Since we used Sezpoz before, we even have legacy support to use (but not generate) Sezpoz-compatible annotation indexes (even with class path libraries different from Oracle's). It is BSD licensed, so feel free to steal^Wuse it.
@dscho Thanks for the suggestion, it looks like it is very similar in API to SezPoz. Does it handle @Target(ElementType.FIELD)
as I have swapped it out in my current code and am getting error: Cannot handle annotated element of kind FIELD
error messages at compilations.
Looking back at this, both JSON and XML seem like overkill. And JSON is not really that desirable, since a useful property is appendability with built-in Maven aggregators, for which you only have line-by-line or XML—JSON would still require a custom aggregator.
Whether using JSON within a line or not, parsing can be simplified by writing all values as strings, not trying to use boolean/numeric primitives at all. For example, in a simple non-JSON format (with runtime type checking), my previous XML example might read:
the.Clazz attributeName="text" arrayValued=["first" "second"] annotationValued={itsOwnOptionalValue="…" anotherValue="…"} classValued="the.ClassValue" enumValued="CONSTANT" byteValued="-13" charValued="@" doubleValued="123.45E15" floatValued="-0.01" intValued="17" longValued="1234567890123" shortValued="55" booleanValued="true"
other.Clazz#methodName()
other.Clazz#fieldName
Thanks for the suggestion, it looks like it is very similar in API to SezPoz
That is by design: we started out with SezPoz, but at some stage it became clear that we have slightly different requirements than SezPoz is prepared to address (in particular, we wanted to be free to use different class path libraries than Oracle's, i.e. be independent on the specifics of the Java serialization of the map used by SezPoz).
Does it handle @Target(ElementType.FIELD)
No, we only need the annotation processing for classes, therefore we stripped out the support for field or method annotations. It should not be hard at all to get that support back in, though.
we have slightly different requirements than SezPoz is prepared to address
Not really, I think. #7 foundered at the time, but the goal remains the same. Ensuring ongoing compatibility with the Android JVM still seems tricky (I am not sure how to mechanically test it), but if the only relevant difference is that they do not intend to comply with the 1.5+ serialization spec of HashMap
, then any textual format would work around this, and be nicer for debugging anyway.
The fork also seems to have removed the instance type parameter to IndexItem
, and the whole instance()
method, so it is not a drop-in replacement I am afraid.
we have slightly different requirements than SezPoz is prepared to address
Not really, I think.
Well, given that I outlined our requirements (which disagree with using Oracle's class path library's specific serialization), I fail to see how my statement is wrong...
But let's just let this conversation die: we already had it, it was not exactly fruitful, and the outcome is now history and everybody can live with it. Case closed.
Would be better to finally drop use of Java serialization, and switch to some reasonably compact, Unicode-safe format that supports persistence of the things SezPoz needs.
Ideally in a compact text format. JSON would work if it is possible to embed (shade) a very small parser/generator.
Existing serialized indices would of course still need to be loaded for compatibility.
Originally SEZPOZ-2.