pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.82k stars 17.99k forks source link

concat : ValueError: Shape of passed values is (4, 9), indices imply (4, 5) #22521

Closed selvam33 closed 6 years ago

selvam33 commented 6 years ago

Code Sample, a copy-pastable example if possible

switch = ['chi41vg224r3', 'chi41vg224r2', 'chi41s12', 'chi41s5', 'chi41s4', 'chi41s7', 'chi41s6', 'chi41s3', 'chi41m1', 'chi41m1', 'chi41m1', 'chidc41m4', 'chidc41m4', 'chidc41m4', 'chidc41m4']
port = ['Fas 0/0', 'Fas 0/0', 'Gig 1/1/2', 'Ten 6/1', 'Ten 6/1', 'Ten 6/1', 'Ten 6/1', 'Ten 6/1', 'Gig 3/2', 'Gig 5/2', 'Gig 5/1', 'Eth 1/34', 'Eth 1/33', 'Eth 1/32', 'Eth 1/31']
po1 = pd.DataFrame([])
stn1 = pd.DataFrame([])
sta1 = pd.DataFrame([])
saa1 = pd.DataFrame([])
allin1 = pd.DataFrame([])

for (sw, op) in zip(switch, other_port):
    switch1 = {'device_type': 'cisco_ios', 'ip': sw, 'username': user, 'password': password, 'port':'22'}
    connect1= ConnectHandler(**switch1)
    opout =connect1.send_command_expect('show run interface '+op)
    connect1.disconnect()
    cfg1 = opout.split('\n')
    parse1 = CiscoConfParse(cfg1)
    intera = parse1.find_objects(r"^interface")
    interf = []
    for interfa in intera:
        tx = interfa.text
        interf.append(tx)
    df2 = pd.DataFrame(dict(Interface=interf))
    po1 = po1.append(df2)

    ALL1 = parse1.find_children_w_parents(r"^interface", r"^")
    STA1 = parse1.find_children_w_parents(r"^interface", r"switchport trunk allowed ")
    SAA1 = parse1.find_children_w_parents(r"^interface", r"switchport trunk allowed add ")
    STN1 = parse1.find_children_w_parents(r"^interface", r"switchport trunk native ")

    ALL_T1 = pd.DataFrame(list(zip(*(interf, ALL1))), columns=('Interface', 'ALL1'))
    allin1 = allin1.append(ALL_T1)
    STA_T1 = pd.DataFrame(list(zip(*(interf, STA1))), columns=('Interface', 'STA1'))
    sta1 = sta1.append(STA_T1)
    SAA_T1 = pd.DataFrame(list(zip(*(interf, SAA1))), columns=('Interface', 'SAA1'))
    saa1 = saa1.append(SAA_T1)
    STN_T1 = pd.DataFrame(list(zip(*(interf, STN1))), columns=('Interface', 'STN1'))
    stn1 = stn1.append(STN_T1)

csv_data = pd.concat([po1.set_index('Interface'), allin1.set_index('Interface'), sta1.set_index('Interface'), saa1.set_index('Interface'), stn1.set_index('Interface')],1, sort=True).reset_index()
csv_data.rename(columns={'index': 'NeighborINT'}, inplace=True)

Output of po1 :

    Interface
0   interface GigabitEthernet1/1/2
0   interface TenGigabitEthernet6/1
0   interface TenGigabitEthernet6/1
0   interface TenGigabitEthernet6/1
0   interface TenGigabitEthernet6/1
0   interface TenGigabitEthernet6/1
0   interface GigabitEthernet3/2
0   interface GigabitEthernet5/2
0   interface GigabitEthernet5/1

Output of allin1 :

    Interface                       ALL1
0   interface GigabitEthernet1/1/2  description uplink to chi41m2
0   interface TenGigabitEthernet6/1 description Uplink
0   interface TenGigabitEthernet6/1 description Uplink
0   interface TenGigabitEthernet6/1 description Uplink
0   interface TenGigabitEthernet6/1 description Uplink
0   interface TenGigabitEthernet6/1 description Uplink
0   interface GigabitEthernet3/2    description CHI41LABFW1-Eth1
0   interface GigabitEthernet5/2    description trunk to chi41m2
0   interface GigabitEthernet5/1    description trunk to chi41m2

output of sta1 :

    Interface                       STA1
0   interface GigabitEthernet1/1/2  switchport trunk allowed vlan 20,41,99-101,10...
0   interface TenGigabitEthernet6/1 switchport trunk allowed vlan 20,41,45,102,10...
0   interface TenGigabitEthernet6/1 switchport trunk allowed vlan 20,41,45,102,10...
0   interface TenGigabitEthernet6/1 switchport trunk allowed vlan 20,41,106,107,1...
0   interface TenGigabitEthernet6/1 switchport trunk allowed vlan 20,41,99-101,10...
0   interface TenGigabitEthernet6/1 switchport trunk allowed vlan 20,41,100,101,1...

output of saa1 (here its empty since, none match in the above script):

Interface   SAA1

output of stn1 :

    Interface                       STN1
0   interface GigabitEthernet1/1/2  switchport trunk native vlan 41
0   interface TenGigabitEthernet6/1 switchport trunk native vlan 41
0   interface TenGigabitEthernet6/1 switchport trunk native vlan 41
0   interface TenGigabitEthernet6/1 switchport trunk native vlan 41
0   interface TenGigabitEthernet6/1 switchport trunk native vlan 41
0   interface TenGigabitEthernet6/1 switchport trunk native vlan 41

Error:

> ValueError                                Traceback (most recent call last)
<ipython-input-131-dce15d2880ec> in <module>()
----> 1 csv_data1 = pd.concat([po1.set_index('Interface'), allin1.set_index('Interface'), sta1.set_index('Interface'), saa1.set_index('Interface'), stn1.set_index('Interface')],1, sort=True).reset_index()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    224                        verify_integrity=verify_integrity,
    225                        copy=copy, sort=sort)
--> 226     return op.get_result()
    227 
    228 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in get_result(self)
    421             new_data = concatenate_block_managers(
    422                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 423                 copy=self.copy)
    424             if not self.copy:
    425                 new_data._consolidate_inplace()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   5423         blocks.append(b)
   5424 
-> 5425     return BlockManager(blocks, axes)
   5426 
   5427 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in __init__(self, blocks, axes, do_integrity_check)
   3280 
   3281         if do_integrity_check:
-> 3282             self._verify_integrity()
   3283 
   3284         self._consolidate_check()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in _verify_integrity(self)
   3491         for block in self.blocks:
   3492             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 3493                 construction_error(tot_items, block.shape[1:], self.axes)
   3494         if len(self.items) != tot_items:
   3495             raise AssertionError('Number of manager items must equal union of '

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in construction_error(tot_items, block_shape, axes, e)
   4841         raise ValueError("Empty data passed with indices specified.")
   4842     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4843         passed, implied))
   4844 
   4845 

ValueError: Shape of passed values is (4, 9), indices imply (4, 5).]

Problem description

[I am logging into various network devices and trying to concat the output. Above script its working fine perfectly. I am facing the issue when i am trying to Concat the output of po1, allin1, sta1, saa1, stn1 using axis=1. I have given the output of all the variable below. po1 will be the source here, if any of the index is not available in other variables it need to show "NAn". If you have issue on understanding the question, please let me know.

Expected Output

Interface ALL1 SAA1 STA1 STN1 0 interface GigabitEthernet1/1/2 description uplink to aust41m2 NaN switchport trunk allowed vlan 20,41,99-101,10... switchport trunk native vlan 41 1 interface TenGigabitEthernet6/1 description Uplink NaN switchport trunk allowed vlan 20,41,45,102,10... switchport trunk native vlan 41 2 interface TenGigabitEthernet6/1 description Uplink NaN switchport trunk allowed vlan 20,41,45,102,10... switchport trunk native vlan 41 3 interface TenGigabitEthernet6/1 description Uplink NaN switchport trunk allowed vlan 20,41,106,107,1... switchport trunk native vlan 41 4 interface TenGigabitEthernet6/1 description Uplink NaN switchport trunk allowed vlan 20,41,99-101,10... switchport trunk native vlan 41 5 interface TenGigabitEthernet6/1 description Uplink NaN switchport trunk allowed vlan 20,41,100,101,1... switchport trunk native vlan 41 6 interface GigabitEthernet3/2 description AUST41LABFW1-Eth1 NaN NaN NaN 7 interface GigabitEthernet5/2 description trunk to aust41m2 NaN NaN NaN 8 interface GigabitEthernet5/1 description trunk to aust41m2 NaN NaN NaN

Output of pd.show_versions()

[paste the output of ``pd.show_versions()`` here below this line] INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None pandas: 0.23.1 pytest: 3.2.1 pip: 18.0 setuptools: 39.2.0 Cython: 0.26.1 numpy: 1.13.3 scipy: 0.19.1 pyarrow: None xarray: None IPython: 6.1.0 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.1.0 openpyxl: 2.4.8 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.0 bs4: 4.6.0 html5lib: 0.999999999 sqlalchemy: 1.1.13 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
TomAugspurger commented 6 years ago

I'm having trouble understanding the issue. What do you think the bug in pandas is?

TomAugspurger commented 6 years ago

http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports may be worth reading through.

WillAyd commented 6 years ago

As mentioned a minimal example is needed to help. Closing for now but feel free to reopen if you can provide an example similar to what's called out in the link above

fruce-ki commented 5 years ago

@WillAyd Can you explain what would cause pandas to throw that error when concatenating on axis=1? I am getting that error too, in a situation with large dataframes where I know some rows are missing from each dataframe. If a dataframe that has all the rows is the first one in concatenation, the error usually does not happen. But having a reference full set of rows defeats the point of join='outer'.

I tried to recreate the problem with small dataframes but the missing rows get padded as expected. The tables on which the error occurs are too long to sensibly inspect manually, more so that I don't know what to look for.

My working assumption is that I've coded something badly, rather than that it's a bug in pandas. I just need more info to figure out what. So if you could please explain what situation can trigger this error, I might get an idea of what I'm doing wrong.

I'm using pandas 0.24.2 in Python 3.7.1 and the problematic command is pd.concat([df1, df2, ...], axis=1, join='outer', ignore_index=False, sort=False). All the tables have an index and it's named the same in all of them. The dataframe shapes are [(18233, 6), (18082, 6), (18233, 6), (18225, 6), (18233, 6), (18082, 6), (18233, 6), (18225, 6)] and the error is ValueError: Shape of passed values is (19308, 50), indices imply (18241, 50).

fruce-ki commented 5 years ago

I think I have it. In my tables there are row keys that are repeated.