Closed selvam33 closed 6 years ago
I'm having trouble understanding the issue. What do you think the bug in pandas is?
http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports may be worth reading through.
As mentioned a minimal example is needed to help. Closing for now but feel free to reopen if you can provide an example similar to what's called out in the link above
@WillAyd Can you explain what would cause pandas to throw that error when concatenating on axis=1
? I am getting that error too, in a situation with large dataframes where I know some rows are missing from each dataframe. If a dataframe that has all the rows is the first one in concatenation, the error usually does not happen. But having a reference full set of rows defeats the point of join='outer'
.
I tried to recreate the problem with small dataframes but the missing rows get padded as expected. The tables on which the error occurs are too long to sensibly inspect manually, more so that I don't know what to look for.
My working assumption is that I've coded something badly, rather than that it's a bug in pandas. I just need more info to figure out what. So if you could please explain what situation can trigger this error, I might get an idea of what I'm doing wrong.
I'm using pandas 0.24.2
in Python 3.7.1
and the problematic command is pd.concat([df1, df2, ...], axis=1, join='outer', ignore_index=False, sort=False)
. All the tables have an index and it's named the same in all of them. The dataframe shapes are [(18233, 6), (18082, 6), (18233, 6), (18225, 6), (18233, 6), (18082, 6), (18233, 6), (18225, 6)]
and the error is ValueError: Shape of passed values is (19308, 50), indices imply (18241, 50)
.
I think I have it. In my tables there are row keys that are repeated.
Code Sample, a copy-pastable example if possible
Problem description
[I am logging into various network devices and trying to concat the output. Above script its working fine perfectly. I am facing the issue when i am trying to Concat the output of po1, allin1, sta1, saa1, stn1 using axis=1. I have given the output of all the variable below. po1 will be the source here, if any of the index is not available in other variables it need to show "NAn". If you have issue on understanding the question, please let me know.
Expected Output
Interface ALL1 SAA1 STA1 STN1 0 interface GigabitEthernet1/1/2 description uplink to aust41m2 NaN switchport trunk allowed vlan 20,41,99-101,10... switchport trunk native vlan 41 1 interface TenGigabitEthernet6/1 description Uplink NaN switchport trunk allowed vlan 20,41,45,102,10... switchport trunk native vlan 41 2 interface TenGigabitEthernet6/1 description Uplink NaN switchport trunk allowed vlan 20,41,45,102,10... switchport trunk native vlan 41 3 interface TenGigabitEthernet6/1 description Uplink NaN switchport trunk allowed vlan 20,41,106,107,1... switchport trunk native vlan 41 4 interface TenGigabitEthernet6/1 description Uplink NaN switchport trunk allowed vlan 20,41,99-101,10... switchport trunk native vlan 41 5 interface TenGigabitEthernet6/1 description Uplink NaN switchport trunk allowed vlan 20,41,100,101,1... switchport trunk native vlan 41 6 interface GigabitEthernet3/2 description AUST41LABFW1-Eth1 NaN NaN NaN 7 interface GigabitEthernet5/2 description trunk to aust41m2 NaN NaN NaN 8 interface GigabitEthernet5/1 description trunk to aust41m2 NaN NaN NaN
Output of
pd.show_versions()