worldbank / REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
https://worldbank.github.io/REaLTabFormer/
MIT License
200 stars 23 forks source link

Bug in process_datetime_data() converting datetime to int #75

Closed efstathios-chatzikyriakidis closed 4 months ago

efstathios-chatzikyriakidis commented 4 months ago

Hi @avsolatorio,

I hope you are well.

The fix #72 is correct and allows to use latest pandas package. However, I am still blocked because of the line:

https://github.com/worldbank/REaLTabFormer/blob/main/src/realtabformer/data_utils.py#L265

There are cases where that could fail, e.g. I have tested in a Windows conda env and failed because bare int was translated to int32. Don't ask me why! My last conclusion was that it is related to Windows implementation of things as I have tested the same code and data and it succeeded in Google Colab and in an Ubuntu Linux container on the same Windows host (64bit machine) using WSL.

I think we can be more explicit and use int64 as datetimes are actually 64bit values, this will be in consistency with the following as well:

https://github.com/worldbank/REaLTabFormer/blob/main/src/realtabformer/data_utils.py#L271

So, I suggest to change it from

series = (series.astype(int) / 1e9)

to:

series = (series.astype('int64') / 1e9)

Can you help me on this? I will need a new PyPI version also (1.0.7).

Thank you!

avsolatorio commented 4 months ago

Hello @efstathios-chatzikyriakidis , thanks for letting me know about the root cause likely being because of windows env. The patch is already published.

I highly recommend that you create a PR if you find some of these changes in the future! 😀

efstathios-chatzikyriakidis commented 4 months ago

Hi @avsolatorio,

Yes, in the future in case I'll find some bug and it is easy to suggest a solution like this one, I will provide a PR.

Thank you so much!