rebcabin / masr

Meta ASR: replacement for aging ASDL
MIT License
4 stars 0 forks source link

Character kind in `ttype` #4

Open rebcabin opened 1 year ago

rebcabin commented 1 year ago

The character kind is listed as 1 byte and utf8. Isn't utf8 a variable-length type, up to 4 bytes?

;; kind: The `kind` member selects the kind of a given type. We currently
;; support the following:
;; Integer kinds: 1 (i8), 2 (i16), 4 (i32), 8 (i64)
;; Real kinds: 4 (f32), 8 (f64)
;; Complex kinds: 4 (c32), 8 (c64)
;; Character kinds: 1 (utf8 string)

Also, as an aside, there is an awkwardness in the following comment:

;; Logical kinds: 1, 2, 4: (boolean represented by 1, 2, 4 bytes; the default
;;     kind is 4, just like the default integer kind, consistent with Python
;;     and Fortran: in Python "Booleans in Python are implemented as a subclass
                    _________           _________
;;     of integers", in Fortran the "default logical kind has the same storage
;;     size as the default integer"; we currently use kind=4 as default
;;     integer, so we also use kind=4 for the default logical.)
rebcabin commented 1 year ago

looks like the meaning of kind is not always the number of bytes.

certik commented 1 year ago

The meaning of "kind" is not bytes, just the kind of integer. We used 1, 2, 4, 8, 16 for int sizes in bytes, because that is what gfortran uses. But we can change it.

We don't need to name it kind either.

Yes, utf8 is mostly 1 byte, but can be up to 4 bytes, variable encoding.

Let's keep this issue open, and document in our ASR docs. In fact I want to move all comments from ASR.asdl into our docs.

rebcabin commented 1 year ago

I did a nice job in MASR today to handle this in a flexible way.