Open jbencina opened 1 year ago
Thanks @jbencina for the report
Looks like the issue isn't in pandas?
In [7]: tabulate.tabulate(df, floatfmt='.0f')
Out[7]: '- ------------------\n0 503498111827123008\n- ------------------'
Might be something to report to https://github.com/astanin/python-tabulate
@MarcoGorelli Thanks. I opened a ticket with the tabulate team https://github.com/astanin/python-tabulate/issues/213. The root cause seems to be that tabulate is treating the int64
data type as a float when coming from a DataFrame. The result is applying the incorrect Python formatting to it. Passing a long int directly to tabulate doesn't produce this issue
table = [[503498111827123021]]
print(tabulate(table))
------------------
503498111827123021
print(tabulate(table, floatfmt='.0f'))
------------------
503498111827123021
------------------
Confirmed this is fixed in the upcoming release of tabulate
cool, thanks!
the minimum version should probably be bumped then - do you want to submit a pull request to do that?
(reopening the issue until the minimum version is bumped)
Good point. I'll see if there's an idea when the next version will be out and circle back here with a PR when available
Summary
When a Pandas DataFrame contains a 64 bit integer and the
.to_markdown()
method is called on the DataFrame, the printed integer is incorrect due to overflow.This behavior is being passed along by the
tabulate
package but is really a fundamental Python issue. I bring this up here because the Pandas.head()
method does print the correct number. Should Pandas be handling this case to present a consistent view of DataFrame data to users regardless of method?If this fix is outside the scope of Pandas, perhaps the Pandas documentation should be updated as a warning.
Reproduction
Test 64bit int with Pandas
head()
Test 64bit int with Pandas
to_markdown()
Test with Python
format()
Pandas Version