ninia / jep

Embed Python in Java
Other
1.33k stars 150 forks source link

Support string type codes for pyjarray creation. #315

Closed bsteffensmeier closed 3 years ago

bsteffensmeier commented 3 years ago

I like the way that the python array module uses simple one character type codes for specifying the array type and I would like to have similar support for jarray construction. Currently primitive jarrays must be created using the J*_ID constants defined in the jep module. It would be convenient to use the same type codes as the python array module but I ran into some problems

1) There is no character for boolean 2) The character that makes sense for char is 'u' but that is deprecated 3) I think we would use 'l' for java int and 'q' for java long and I'm not sure what we could do with 'i'. I think 'l' for int would confuse java developers.

Numpy also has support for similar typestrs but since they encode the length of the type that is harder to parse and I feel like it is less intuitive especially since we only support a very small set of types compared to numpy. I settled on the JNI type characters in lowercase(z, b, c, s, i, j, f or d). This perfectly matches the types we support. There still may be some confusion since 'j' is used for long but I would rather defer to an existing "standard" instead of come up with our own system.

I want to continue to support jep.J*_ID from python for backwards compatibility but I want the public API to reflect that we are supporting type codes instead of ints for these so I switched them to the type codes.

jep.JSTRING_ID does not have a type code, I think we should drop that field at some point and encourage developers to just import and use java.lang.String. For for this change I chose to preserve backwards compatibility so I made JSTRING_ID the pyjclass for java.lang.String

Another problem with this change is that the c code has J*_ID constant that no longer match the python constants. These constants cannot be strings because they are used in switch statements. Eventually I think they should be changed to the char/ascii/unicode values for the type code but for this change I left the existing values. Since they are only used internally there is no user facing impact except if there is any case where developers have hardcoded the numerical constant it will still work now.

While testing I notices that passing a primitive class, such as java.lang.Integer.TYPE, into jarray would crash, so I made code to handle that case and return a primitive array. Perhaps JINT_ID should be set to java.lang.Integer.TYPE instead of a type code? I like the type code more.

Below is a summary of the major design decisions in this change. I'd appreciate input to see if others have different opinions on any of these.

1) Should we support types codes?

ndjensen commented 3 years ago

I think we should remove JSTRING_ID in 4.0.