CRC-32 checksum is negative integer (why not BINARY! or positive?)

hostilefork commented 5 years ago

The CRC32 exposed in R3-Alpha returned a signed integer.

r3-alpha>> checksum/method #{AE} 'crc32   
== -479436446

Red does the same (though they are limited to 32-bit signed INTEGER!, so they don't have a choice to be unsigned, or they could only represent half the CRC values...)

red>> checksum #{AE} 'crc32  ; no /METHOD refinement (required)
== -479436446

The more common concept of CRC-32 is unsigned. But as the fact that code has worked regardless, what really matters is generally the bytes....because things that check CRC-32 are typically decoding streams and have to be sensitive to big endian / little endian.

Other common checksum types return BINARY!:

>> checksum/method #{ABCD} 'md5
== #{7838496FD0586421BBB500BB6F472F13}

>> checksum/method #{ABCD} 'sha1
== #{32825EB98DE842EE3E4DF005A07B7D65522A46A0}

So it seems doing that for CRC32 as a 4 byte binary would dodge concerns of the integer representation. But it seems (unfortunately) no one has standardized the byte order of transmission for CRC-32. Two places we use it are little endian:

The gzip spec says, "All multi-byte numbers in the format described here are stored with the least-significant byte first (at the lower memory address)." (little-endian => least-significant byte first)
PKZIP spec says, *"All values MUST be stored in little-endian byte order unless otherwise specified in this document for a specific data element."

...but despite little endian winning in many hardware areas, people still seem skeptical of saying any standard has won.

If it has to be an integer, it would seem that it should be unsigned. Note Python2's integer (like Red and Rebol2's integer) was limited to a signed 32-bit value, but Python3 has bignums for arbitrary precision integer. So it switched to signed:

https://stackoverflow.com/questions/32940417/unsigned-crc-32-for-python-to-match-javas-crc-32

Oldes commented 5 years ago

I would keep it how it is.. or what is the problem with signed integer? Returning binary may look better in console session, but it is wasting resources.. also the conversion to unsigned integer is operation which may be unnecessary and one can convert the signed integer to unsigned easily if it is needed.

hostilefork commented 4 years ago

Ren-C has standardized on Little Endian BINARY!.

Conversion of this binary to an integer can be done with DEBIN as either signed or unsigned, etc.

metaeducation / rebol-issues

CRC-32 checksum is negative integer (why not BINARY! or positive?) #2375