root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.72k stars 1.29k forks source link

Deprecating RtypesCore.h #12208

Open klenze opened 1 year ago

klenze commented 1 year ago

Dear ROOT devs, as you may be aware, recent developments in the standardization of C++ brought the C99 stdint to C++11.

When the original Rtypes.h was written in the nineties, there may have been some need for making sure that sizeof(int) was at least 4. Today, some 37 years after the Intel 80386, I suspect that the only systems where sizeof(int)==2 are 16 bit microcontrollers which lack any 32 bit data types and are probably a bit too constrained with regard to addressable memory to run ROOT.

RtypesCore.h seems to be a bit outdated.

From my understanding, Float16_t and Double32_t are meant to warn the user that parts of the mantissa are truncated. Given the prevalence of stdint (where the number denotes the stored size in bits), I am unsure if that meaning is successfully transported. Float7BitMantissa_t might be more verbose.

Regarding the limits, I notice that for most types, the listed values are the minimum/maximum value that type can take, for example: const UInt_t kMaxUInt = UInt_t(~0); (NB: on a system where sizeof(int)==2, the compiler is (afaik) free to interpret 0 as int, whose complement will then be disappointingly small.) Under this interpretation, I want to cast doubt on

const Int_t     kMaxUChar    = 256;
const Int_t     kMaxUShort   = 65534;

While there were some promising publications of computer scientists claiming to have succeeded in storing 256 in an unsigned char, later attempts to reproduce that research found that the bit sequence was indistinguishable from zero. The general consensus among computer scientists today is that the highest value which can be stored in a uint8_t is in fact 255. On the other hand, a few years later initially doubtful claims that 65535 could be stored in an unsigned short were replicated. Today, big tech (e.g. GAFAM) routinely store that value in their 16 bit numbers and even smaller startups are picking up on the trend. </sarcasm>

(Also, while the kMaxUShort error is rendered harmless by the bitshift in kMaxShort, the kMaxUChar error means that kMinChar and kMaxChar are wrong. In fact, Char_t(kMinChar)>Char_t(kMaxChar) is true. However, nine out of ten math professors agree that the maximum of a set of values should be at least as large as the minimum of that set. [[citation needed]])

These issues have been present in the codebase since at least ROOT5, so fixing them would break backwards compatibility. My suggestion is a different one. C++14 introduces an attribute called [[deprecated]]

[[deprecated("kMaxUChar has a misleading value and should not be used."
             " Please use std::numeric_limits<unsigned char>::max()")]] 
  const Int_t     kMaxUChar    = 256;

And while we are at it, let's be honest: Most users are already somewhat aware that there is a C++ ecosystem outside the safe-ish space that is ROOT. The fallout of vendor-specific implementations has long decayed to a safe level, the air is fresh and breathable, standard compliant implementations are blooming. Thus:

[[deprecated("Just call it \"int\".")]]
  typedef int            Int_t;       //Signed integer 4 bytes (int)

Just my two cents, Philipp

Axel-Naumann commented 1 year ago

Thanks for your report! I think much of this would cause churn without much benefit on the users' side: they would simply hate us for littering them with deprecation warnings that they just don't care about.

Instead, we moved away from these types in new code (RDataFrame, RNTuple, etc); new tutorials are also just using standard types.

I don't know whether anything but this will be constructive / productive enough. What are your thoughts?

I'd certainly like to use std features for the underlying types / values for what's in Rtypes as much as possible, instead of rolling out own. But that seems orthogonal.

ferdymercury commented 1 year ago

Instead, we moved away from these types in new code (RDataFrame, RNTuple, etc); new tutorials are also just using standard types.

I don't know whether anything but this will be constructive / productive enough. What are your thoughts?

I agree with this strategy :)

What I maybe think it's worth is to add the "deprecated warnings" suggested by @klenze to the Doxygen documentation. This way, it does not bother anyone compiling, but gives useful instructions to people developing new scripts.

For example:

typedef int Int_t; //Signed integer 4 bytes (int) \deprecated For new designs, use instead std::int32_t

klenze commented 1 year ago

On further reflection, I agree that using [[deprecated]] on Int_t would annoy many users to no practical purpose.

Deprecating it in the comments sounds reasonable enough.

pcanal commented 1 year ago

And as a reminder we are currently not yet setup to properly handle 'nicely' some of those types in the I/O. std::int32_t and std::int64_t are implemented in the C++ library as typedef/using-alias and thus appear to us after parsing as the underlying type. And not all platforms agree on what that type is (We have seen int, long for the 1st one and long and long long for the 2nd one). Because of long standing choices and backward compatibility, those can lead to different on-file layout (which must be platform independent)

This is of course fixable but is not completed yet.