hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.53k stars 314 forks source link

DateTime internal properties not setting properly on copy / set_time. #313

Closed TheBlackPlague closed 4 months ago

TheBlackPlague commented 4 months ago

The DateTime's internal properties such as "Date" or "Hour" aren't set properly when it gets copied or gets set by set_time().

However, if one does a add_months(0) to it, it sets it properly.

hosseinmoein commented 4 months ago

date/time component setting is a relatively expensive operation. DateTime implements lazy-evaluation to set them, meaning they are only set when they are asked for through the API. Can you provide a code snippet that shows the problem?

TheBlackPlague commented 4 months ago

Lazy evaluation is hard to deal with when debugging one's program. The issue I'm facing has nothing to do with the DateTime class malfunctioning but that it becomes hard to debug these DateTimes (especially when they're in a series).

For example, see the following Time Conversion Code:

//
// Copyright (c) 2024 Venetia. All rights reserved.
//

#ifndef VENETIA_TIMECONVERSION_H
#define VENETIA_TIMECONVERSION_H

#include <DataFrame/DataFrame.h>

hmdf::DateTime AsDateTime(const time_t time)
{
    hmdf::DateTime d;
    d.set_time(time);
    d.add_months(0);
    return d;
}

time_t AsTime(const hmdf::DateTime& time)
{
    // ... Inverse of AsDateTime

    return time.time();
}

hmdf::DateTime AsDateTime(const uint64_t timestamp)
{
    return AsDateTime(static_cast<time_t>(timestamp / 1000));
}

uint64_t AsTimeStamp(const hmdf::DateTime& time)
{
    return static_cast<uint64_t>(AsTime(time)) * 1000;
}

hmdf::DateTime Now()
{
    return AsDateTime(std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()));
}

#endif //VENETIA_TIMECONVERSION_H

Using the above code, I'm creating large vectors of pairs of DateTimes. The issue is when debugging to see if everything is functioning, I realize those DateTimes have their properties improperly set. It'd be nice if Lazy Evaluation for DateTime is disabled when in debug mode.

My necessities for using the DateTime class are:

In my opinion, only a nanosecond timestamp (uint64_t) needs to be stored for the above to be possible, where the relative components can be (lazily) derived from it. In the case of debug mode, they should be derived instantly.

hosseinmoein commented 4 months ago

Making lazy evaluation optional in DateTime is not practical.

For debugging purposes, you can call the add_month() (or you can call const function like date() -- even better. It does the same thing. Printing also does the same thing) for each set_time() to populate all the fields.

TheBlackPlague commented 4 months ago

Making lazy evaluation optional in DateTime is not practical.

For debugging purposes, you can call the add_month() (or you can call const function like date() -- even better. It does the same thing. Printing also does the same thing) for each set_time() to populate all the fields.

I am referring to debugging in the IDE. Why wouldn't it be practical if the code is compiled with no optimization and debug symbols? The whole point of such a mode is to forfeit performance for clarity to ease in the debugging process — isn't the library forfeiting such standards?

hosseinmoein commented 4 months ago

I will not discuss the code practicality any further. What I meant is you can create a compile directive (for example DEBUG_MODE). In your code under that directive call .date() on all DataTimes. This way when that directive is defined all DateTimes are set.

TheBlackPlague commented 4 months ago

I will not discuss the code practicality any further. What I meant is you can create a compile directive (for example DEBUG_MODE). In your code under that directive call .date() on all DataTimes. This way when that directive is defined all DateTimes are set.

I understood what you meant but it causes extra boilerplate added.

Regardless, since you mentioned it's not up for discussion, I'll just close the issue.