Open crew102 opened 5 years ago
Hmm yea that does seem weird. Apply a print shows the dtypes as object:
>>> df.apply(print)
0 1
1 2
2 3
Name: col_1, dtype: object
0 hi
1 there
2 friend
Name: col_2, dtype: object
So I think something is awry here with the underlying block management. @jbrockmendel might have some thoughts
Just to confirm, if everything in the frame was a number this would work
>>> df['col_2'] = [4, 5, 6]
>>> df.apply(lambda x: np.issubdtype(x, np.number))
col_1 True
col_2 True
dtype: bool
Investigation and PRs are of course welcome
Just to confirm, if everything in the frame was a number this would work
K, well that's good to know/will help with debugging. Can you briefly describe what you mean by "block management?" I'd be happy to investigate this issue, though it'd be great to have a tip on where to look first.
Hmm I think this starts diverging here:
The problem with calling .values
on a 2D object is that (in this case at least) returns a 2D numpy array which must have a contiguous dtype. The only dtype that can hold say 1
and "hello"
is object, hence why all of these lose their dtype information
You might just have to iterate over the axis to maintain that dtype info, maybe building up a dict of results and returning from there at the end
In any case certainly would welcome investigation and a PR if you can make it all work
Yeah, that definitely looks like the issue. I'll take a stab at a solution in the next few weeks or so.
I'm out of town until Tuesday, will take a look a this then.
@crew102 Are you still working on this, or could I take over the task?
Sorry, haven't had time to look into this. Yes, please take it over.
Problem description
pandas.DataFrame.apply()
seems to be converting series from int64 to object in some circumstances, and I'm not sure why. An example of the strange behavior I'm seeing is shown below, along with comments on what I'm expecting to see versus what I actually see. Note, this issue was originally reported on SO here: https://stackoverflow.com/questions/58222263/unexpected-behavior-when-applying-function-to-all-columns-in-pandas-data-frame.Code Sample, a copy-pastable example if possible
Created on 2019-10-03 by the reprexpy package
Output of
pd.show_versions()