pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.82k stars 17.99k forks source link

BUG: Concat returns inconsistent MultiIndex #44786

Closed pkohlmann closed 2 years ago

pkohlmann commented 2 years ago

Reproducible Example

import pandas as pd

df1 = pd.DataFrame({'col': ['a','b','c']}, index=['1','2','2'])
df2 = pd.concat([df1], keys=['X'])
print(df2.index)
print(f"Index of df2 has duplicates: {df2.index.has_duplicates}")
print(f"Index of df2 is unique: {df2.index.is_unique}")

Issue Description

DataFrame df1 has an index with duplicates, i.e. the index of df1 is not unique. When df1 is used as the only DataFrame in pandas's conat(), i.e. the concat() is applied to a single DataFrame, the resulting DataFrame has a MultiiIndex which has duplicates but the metadata of this MultiIndex say it is unique and does not have duplicates. When concat() is applied to a list of two or more DataFrames the resulting MultiIndex is correct. The scenario with only one DataFrame takes a short-cut for creating the MultiIndex which apparently has a flaw. Deploying concat() on a single dataframe is a valid scenario which ,for example, can occur in a GroupBy.apply(). The inconsistency in the MultiIndex makes other functionality like Index.drop_duplicates() fail because they rely on the metadata of MultiIndex.

Expected Behavior

Concat() returns a DataFrame with a consistent MultiIndex when deployed to a single DataFrame with duplicates.

Installed Versions

2021-12-06T12:40:32.184 DEBUG matplotlib.wrapper (private) matplotlib data path: /Users/kohlmann/PycharmProjects/env3.7/lib/python3.7/site-packages/matplotlib/mpl-data 2021-12-06T12:40:32.185 DEBUG matplotlib.wrapper matplotlib data path: /Users/kohlmann/PycharmProjects/env3.7/lib/python3.7/site-packages/matplotlib/mpl-data 2021-12-06T12:40:32.197 DEBUG matplotlib.wrapper CONFIGDIR=/Users/kohlmann/.matplotlib 2021-12-06T12:40:32.203 DEBUG matplotlib. matplotlib version 3.3.2 2021-12-06T12:40:32.203 DEBUG matplotlib. interactive is False 2021-12-06T12:40:32.204 DEBUG matplotlib. platform is darwin 2021-12-06T12:40:32.204 DEBUG matplotlib. loaded modules: ['sys', 'builtins', '_frozen_importlib', '_imp', '_thread', '_warnings', '_weakref', 'zipimport', '_frozen_importlib_external', '_io', 'marshal', 'posix', 'encodings', 'codecs', '_codecs', 'encodings.aliases', 'encodings.utf_8', '_signal', '__main__', 'encodings.latin_1', 'io', 'abc', '_abc', 'site', 'os', 'stat', '_stat', '_collections_abc', 'posixpath', 'genericpath', 'os.path', '_sitebuiltins', '_bootlocale', '_locale', '_virtualenv', 'functools', '_functools', 'collections', 'operator', '_operator', 'keyword', 'heapq', '_heapq', 'itertools', 'reprlib', '_collections', 'importlib', 'importlib._bootstrap', 'importlib._bootstrap_external', 'types', 'warnings', 'importlib.abc', 'importlib.machinery', 'importlib.util', 'contextlib', 'google', 'mpl_toolkits', 'wiotp', 'weakref', '_weakrefset', '_pydevd_bundle', '_pydevd_bundle.pydevd_collect_try_except_info', 'opcode', '_opcode', 'dis', 'atexit', 'traceback', 'linecache', 'tokenize', 're', 'enum', 'sre_compile', '_sre', 'sre_parse', 'sre_constants', 'copyreg', 'token', '_pydevd_bundle.pydevd_constants', '__future__', 'platform', 'subprocess', 'time', 'signal', 'errno', '_posixsubprocess', 'select', 'selectors', 'collections.abc', 'math', 'threading', '_pydevd_bundle.pydevd_vm_type', '_pydev_imps', '_pydev_imps._pydev_saved_modules', 'socket', '_socket', 'queue', '_queue', 'xmlrpc', 'xmlrpc.client', 'base64', 'struct', '_struct', 'binascii', 'datetime', '_datetime', 'decimal', 'numbers', '_decimal', 'http', 'http.client', 'email', 'email.parser', 'email.feedparser', 'email.errors', 'email._policybase', 'email.header', 'email.quoprimime', 'string', '_string', 'email.base64mime', 'email.charset', 'email.encoders', 'quopri', 'email.utils', 'random', 'hashlib', '_hashlib', '_blake2', '_sha3', 'bisect', '_bisect', '_random', 'urllib', 'urllib.parse', 'email._parseaddr', 'calendar', 'locale', 'email.message', 'uu', 'email._encoded_words', 'email.iterators', 'ssl', '_ssl', 'xml', 'xml.parsers', 'xml.parsers.expat', 'pyexpat.errors', 'pyexpat.model', 'pyexpat', 'xml.parsers.expat.model', 'xml.parsers.expat.errors', 'gzip', 'zlib', '_compression', 'xmlrpc.server', 'http.server', 'copy', 'html', 'html.entities', 'mimetypes', 'shutil', 'fnmatch', 'bz2', '_bz2', 'lzma', '_lzma', 'pwd', 'grp', 'socketserver', 'inspect', 'pydoc', 'pkgutil', 'sysconfig', '_sysconfigdata_m_darwin_darwin', '_osx_support', 'fcntl', '_pydev_bundle', '_pydev_bundle.fix_getpass', '_pydev_bundle.pydev_imports', '_pydev_imps._pydev_execfile', '_pydevd_bundle.pydevd_exec2', '_pydev_bundle.pydev_log', '_pydev_bundle._pydev_filesystem_encoding', '_pydev_bundle.pydev_is_thread_alive', '_pydevd_bundle.pydevd_io', 'pydevd_tracing', 'ctypes', '_ctypes', 'ctypes._endian', '_pydev_bundle.pydev_monkey', '_pydevd_bundle.pydevd_utils', 'pydevd_file_utils', '_pydevd_bundle.pydevd_comm_constants', 'json', 'json.decoder', 'json.scanner', '_json', 'json.encoder', '_pydevd_bundle.pydevd_vars', 'pickle', '_compat_pickle', '_pickle', '_pydevd_bundle.pydevd_custom_frames', '_pydevd_bundle.pydevd_xml', '_pydevd_bundle.pydevd_extension_utils', 'pydevd_plugins', 'pkg_resources', 'zipfile', 'plistlib', 'tempfile', 'textwrap', 'ntpath', 'pkg_resources.extern', 'pkg_resources._vendor', 'pkg_resources._vendor.six', 'pkg_resources.extern.six', 'pkg_resources._vendor.six.moves', 'pkg_resources.extern.six.moves', 'pkg_resources._vendor.appdirs', 'pkg_resources.extern.appdirs', 'pkg_resources._vendor.packaging', 'pkg_resources._vendor.packaging.__about__', 'pkg_resources.extern.packaging', 'pkg_resources.extern.packaging.version', 'pkg_resources.extern.packaging._structures', 'pkg_resources.extern.packaging.specifiers', 'pkg_resources.extern.packaging._compat', 'pkg_resources.extern.packaging.requirements', 'pkg_resources._vendor.pyparsing', 'pprint', 'pkg_resources.extern.pyparsing', 'pkg_resources._vendor.six.moves.urllib', 'pkg_resources.extern.six.moves.urllib', 'pkg_resources.extern.packaging.markers', 'pkg_resources.py2_warn', 'pydevd_plugins.extensions', '_pydevd_bundle.pydevd_resolver', '_pydevd_bundle.pydevd_extension_api', '_pydevd_bundle.pydevd_save_locals', '_pydev_bundle.pydev_override', '_pydevd_bundle.pydevd_breakpoints', '_pydevd_bundle.pydevd_import_class', '_pydevd_bundle.pydevd_frame_utils', '_pydevd_bundle.pydevd_comm', '_pydevd_bundle.pydevd_console_integration', 'code', 'codeop', '_pydev_bundle.pydev_code_executor', '_pydev_bundle._pydev_calltip_util', '_pydev_bundle._pydev_imports_tipper', '_pydev_bundle._pydev_tipper_common', '_pydev_bundle.pydev_stdin', '_pydev_bundle.pydev_console_types', '_pydevd_bundle.pydevd_console_pytest', '_pydevd_bundle.pydevd_bytecode_utils', '_pydev_bundle._pydev_completer', '_pydevd_bundle.pydevd_tables', '_pydevd_bundle.pydevd_console', '_pydevd_bundle.pydevd_dont_trace_files', '_pydevd_bundle.pydevd_kill_all_pydevd_threads', '_pydevd_bundle.pydevd_trace_dispatch', '_pydevd_bundle.pydevd_cython_wrapper', '_pydevd_bundle.pydevd_additional_thread_info_regular', '_pydevd_bundle.pydevd_frame', '_pydevd_bundle.pydevd_dont_trace', '_pydevd_bundle.pydevd_signature', 'trace', 'gc', '_pydevd_bundle.pydevd_cython_darwin_37_64', '_cython_0_29_21', 'cython_runtime', '_pydevd_bundle.pydevd_cython', '_pydevd_frame_eval', '_pydevd_frame_eval.pydevd_frame_eval_main', '_pydevd_frame_eval.pydevd_frame_eval_cython_wrapper', '_pydevd_frame_eval.pydevd_frame_evaluator_common', '_pydevd_frame_eval.pydevd_frame_tracing', '_pydevd_frame_eval.pydevd_modify_bytecode', '_pydevd_bundle.pydevd_additional_thread_info', '_pydevd_frame_eval.pydevd_frame_evaluator_darwin_37_64', 'pydevd_concurrency_analyser', 'pydevd_concurrency_analyser.pydevd_concurrency_logger', 'pydevd_concurrency_analyser.pydevd_thread_wrappers', 'asyncio', 'asyncio.base_events', 'concurrent', 'concurrent.futures', 'concurrent.futures._base', 'logging', 'asyncio.constants', 'asyncio.coroutines', 'asyncio.base_futures', 'asyncio.format_helpers', 'asyncio.log', 'asyncio.events', 'contextvars', '_contextvars', 'asyncio.base_tasks', '_asyncio', 'asyncio.futures', 'asyncio.protocols', 'asyncio.sslproto', 'asyncio.transports', 'asyncio.tasks', 'asyncio.locks', 'asyncio.runners', 'asyncio.queues', 'asyncio.streams', 'asyncio.subprocess', 'asyncio.unix_events', 'asyncio.base_subprocess', 'asyncio.selector_events', 'pydev_ipython', '_pydevd_bundle.pydevd_breakpointhook', '_pydevd_bundle.pydevd_plugin_utils', '_pydevd_bundle.pydevd_trace_api', 'pydevd_plugins.django_debug', 'pydevd_plugins.jinja2_debug', 'pydevd_plugins.extensions.types', 'pydevd_plugins.extensions.types.pydevd_plugin_numpy_types', 'pydevd_plugins.extensions.types.pydevd_helpers', 'pydevd_plugins.extensions.types.pydevd_plugins_django_form_str', '_pydevd_bundle.pydevd_command_line_handling', 'getpass', 'termios', '_pydevd_bundle.pydevd_process_net_command', '_pydevd_bundle.pydevd_traceproperty', '_pydev_bundle.pydev_monkey_qt', 'analytics_service_cloud_functions', 'pydevd', 'pydev_ipython.matplotlibtools', 'pydev_ipython.inputhook', 'runpy', 'numpy', 'numpy._globals', 'numpy.__config__', 'numpy.version', 'numpy._distributor_init', 'numpy.core', 'numpy.core.multiarray', 'numpy.core.overrides', 'numpy.core._multiarray_umath', 'numpy.compat', 'numpy.compat._inspect', 'numpy.compat.py3k', 'pathlib', 'numpy.core.umath', 'numpy.core.numerictypes', 'numpy.core._string_helpers', 'numpy.core._type_aliases', 'numpy.core._dtype', 'numpy.core.numeric', 'numpy.core.shape_base', 'numpy.core._asarray', 'numpy.core.fromnumeric', 'numpy.core._methods', 'numpy.core._exceptions', 'numpy.core._ufunc_config', 'numpy.core.arrayprint', 'numpy.core.defchararray', 'numpy.core.records', 'numpy.core.memmap', 'numpy.core.function_base', 'numpy.core.machar', 'numpy.core.getlimits', 'numpy.core.einsumfunc', 'numpy.core._add_newdocs', 'numpy.core._multiarray_tests', 'numpy.core._dtype_ctypes', 'numpy.core._internal', 'numpy._pytesttester', 'numpy.lib', 'numpy.lib.mixins', 'numpy.lib.scimath', 'numpy.lib.type_check', 'numpy.lib.ufunclike', 'numpy.lib.index_tricks', 'numpy.matrixlib', 'numpy.matrixlib.defmatrix', 'ast', '_ast', 'numpy.linalg', 'numpy.linalg.linalg', 'numpy.lib.twodim_base', 'numpy.linalg.lapack_lite', 'numpy.linalg._umath_linalg', 'numpy.lib.function_base', 'numpy.lib.histograms', 'numpy.lib.stride_tricks', 'numpy.lib.nanfunctions', 'numpy.lib.shape_base', 'numpy.lib.polynomial', 'numpy.lib.utils', 'numpy.lib.arraysetops', 'numpy.lib.npyio', 'numpy.lib.format', 'numpy.lib._datasource', 'numpy.lib._iotools', 'numpy.lib.financial', 'numpy.lib.arrayterator', 'numpy.lib.arraypad', 'numpy.lib._version', 'numpy.fft', 'numpy.fft._pocketfft', 'numpy.fft._pocketfft_internal', 'numpy.fft.helper', 'numpy.polynomial', 'numpy.polynomial.polynomial', 'numpy.polynomial.polyutils', 'numpy.polynomial._polybase', 'numpy.polynomial.chebyshev', 'numpy.polynomial.legendre', 'numpy.polynomial.hermite', 'numpy.polynomial.hermite_e', 'numpy.polynomial.laguerre', 'numpy.random', 'numpy.random._pickle', 'numpy.random.mtrand', 'numpy.random._bit_generator', '_cython_0_29_19', 'numpy.random._common', 'secrets', 'hmac', 'numpy.random._bounded_integers', 'numpy.random._mt19937', 'numpy.random._philox', 'numpy.random._pcg64', 'numpy.random._sfc64', 'numpy.random._generator', 'numpy.ctypeslib', 'numpy.ma', 'numpy.ma.core', 'numpy.ma.extras', 'pandas', 'pytz', 'pytz.exceptions', 'pytz.lazy', 'pytz.tzinfo', 'pytz.tzfile', 'dateutil', 'dateutil._version', 'pandas.compat', 'pandas._typing', 'mmap', 'typing', 'typing.io', 'typing.re', 'pandas.compat.numpy', 'pandas.util', 'pandas.util._decorators', 'pandas._libs', 'pandas._libs.interval', '_cython_0_29_24', 'pandas._libs.hashtable', 'pandas._libs.missing', 'pandas._libs.tslibs', 'pandas._libs.tslibs.dtypes', 'pandas._libs.tslibs.conversion', 'pandas._libs.tslibs.base', 'pandas._libs.tslibs.nattype', 'pandas._libs.tslibs.np_datetime', 'pandas._libs.tslibs.timezones', 'dateutil.tz', 'dateutil.tz.tz', 'six', 'six.moves', 'dateutil.tz._common', 'dateutil.tz._factories', 'pandas._libs.tslibs.tzconversion', 'pandas._libs.tslibs.ccalendar', 'pandas._libs.tslibs.parsing', 'pandas._libs.tslibs.offsets', 'pandas._libs.tslibs.timedeltas', 'pandas._libs.tslibs.fields', 'pandas._config', 'pandas._config.config', 'pandas._config.dates', 'pandas._config.display', 'pandas._config.localization', 'pandas._libs.tslibs.strptime', 'pandas._libs.tslibs.timestamps', 'dateutil.easter', 'dateutil.relativedelta', 'dateutil._common', 'pandas._libs.properties', 'dateutil.parser', 'dateutil.parser._parser', 'dateutil.parser.isoparser', 'pandas._libs.tslibs.period', 'pandas._libs.tslibs.vectorized', 'pandas._libs.ops_dispatch', 'pandas._libs.algos', 'pandas.core', 'pandas.core.util', 'pandas.core.util.hashing', 'pandas._libs.lib', 'pandas._libs.tslib', 'pandas._libs.hashing', 'pandas.core.dtypes', 'pandas.core.dtypes.common', 'pandas.core.dtypes.base', 'pandas.errors', 'pandas.core.dtypes.generic', 'pandas.core.dtypes.dtypes', 'pandas.core.dtypes.inference', 'pandas.util.version', 'pandas.compat.pyarrow', 'pyarrow', 'pyarrow._generated_version', 'pyarrow.lib', 'pyarrow.util', 'pyarrow.hdfs', 'pyarrow.filesystem', 'pyarrow.ipc', 'pyarrow.serialization', 'pyarrow.types', 'pandas.core.config_init', 'pandas.core.api', 'pandas.core.dtypes.missing', 'pandas.core.algorithms', 'pandas.core.dtypes.cast', 'pandas.util._exceptions', 'pandas.util._validators', 'pandas.core.array_algos', 'pandas.core.array_algos.take', 'pandas.core.construction', 'pandas.core.common', 'pandas.core.indexers', 'pandas.core.arrays', 'pandas.core.arrays.base', 'pandas.compat.numpy.function', 'pandas.core.missing', 'pandas.compat._optional', 'pandas.core.ops', 'pandas.core.roperator', 'pandas.core.ops.array_ops', 'pandas._libs.ops', 'pandas.core.computation', 'pandas.core.computation.expressions', 'pandas.core.computation.check', 'pandas.core.ops.missing', 'pandas.core.ops.dispatch', 'pandas.core.ops.invalid', 'pandas.core.ops.common', 'pandas.core.ops.docstrings', 'pandas.core.ops.mask_ops', 'pandas.core.ops.methods', 'pandas.core.sorting', 'pandas.core.arrays.boolean', 'pandas.core.arrays.masked', 'pandas.core.nanops', 'pandas.core.array_algos.masked_reductions', 'pandas.core.arraylike', 'pandas.core.arrays.categorical', 'csv', '_csv', 'pandas._libs.arrays', 'pandas.core.accessor', 'pandas.core.arrays._mixins', 'pandas.core.array_algos.transforms', 'pandas.core.base', 'pandas.core.strings', 'pandas.core.strings.accessor', 'pandas.core.strings.base', 'pandas.core.strings.object_array', 'unicodedata', 'pandas.io', 'pandas.io.formats', 'pandas.io.formats.console', 'pandas.core.arrays.datetimes', 'pandas.core.arrays.datetimelike', 'pandas.tseries', 'pandas.tseries.frequencies', 'pandas.core.arrays._ranges', 'pandas.core.arrays.integer', 'pandas.core.arrays.numeric', 'pandas.core.tools', 'pandas.core.tools.numeric', 'pandas.tseries.offsets', 'pandas.core.arrays.floating', 'pandas.core.arrays.interval', 'pandas.core.indexes', 'pandas.core.indexes.base', 'pandas._libs.index', 'pandas._libs.join', 'pandas.core.dtypes.concat', 'pandas.core.arrays.sparse', 'pandas.core.arrays.sparse.accessor', 'pandas.core.arrays.sparse.array', 'pandas._libs.sparse', 'pandas.core.arrays.sparse.dtype', 'pandas.io.formats.printing', 'pandas.core.array_algos.putmask', 'pandas.core.indexes.frozen', 'pandas.core.arrays.numpy_', 'pandas.core.arrays.period', 'pandas.core.arrays.string_', 'pandas.core.arrays.string_arrow', 'pyarrow.compute', 'pyarrow._compute', 'pandas.core.arrays.timedeltas', 'pandas.core.flags', 'pandas.core.groupby', 'pandas.core.groupby.generic', 'pandas._libs.reduction', 'pandas.core.aggregation', 'pandas.core.indexes.api', 'pandas.core.indexes.category', 'pandas.core.indexes.extension', 'pandas.core.indexes.datetimes', 'pandas.core.indexes.datetimelike', 'pandas.core.indexes.numeric', 'pandas.core.tools.timedeltas', 'pandas.core.tools.times', 'pandas.core.indexes.interval', 'pandas.core.indexes.multi', 'pandas.core.indexes.timedeltas', 'pandas.core.indexes.period', 'pandas.core.indexes.range', 'pandas.core.apply', 'pandas.core.frame', 'pandas.core.generic', 'pandas.core.indexing', 'pandas._libs.indexing', 'pandas.core.describe', 'pandas.core.reshape', 'pandas.core.reshape.concat', 'pandas.core.internals', 'pandas.core.internals.api', 'pandas._libs.internals', 'pandas.core.internals.blocks', 'pandas._libs.writers', 'pandas.core.array_algos.quantile', 'pandas.core.array_algos.replace', 'pandas.core.internals.array_manager', 'pandas.core.internals.base', 'pandas.core.internals.concat', 'pandas.core.internals.managers', 'pandas.core.internals.ops', 'pandas.io.formats.format', 'pandas.io.common', 'dataclasses', 'pandas.core.internals.construction', 'pandas.core.shared_docs', 'pandas.core.window', 'pandas.core.window.ewm', 'pandas._libs.window', 'pandas._libs.window.aggregations', 'pandas.core.util.numba_', 'pandas.core.window.common', 'pandas.core.window.doc', 'pandas.core.window.indexers', 'pandas._libs.window.indexers', 'pandas.core.window.numba_', 'pandas.core.window.online', 'pandas.core.window.rolling', 'pandas.core.window.expanding', 'pandas.core.reshape.melt', 'pandas.core.reshape.util', 'pandas.core.series', 'pandas._libs.reshape', 'pandas.core.indexes.accessors', 'pandas.core.tools.datetimes', 'pandas.arrays', 'pandas.plotting', 'pandas.plotting._core', 'pandas.plotting._misc', 'pandas.io.formats.info', 'pandas.core.groupby.base', 'pandas.core.groupby.groupby', 'pandas._libs.groupby', 'pandas.core.groupby.numba_', 'pandas.core.groupby.ops', 'pandas.core.groupby.grouper', 'pandas.core.groupby.categorical', 'pandas.tseries.api', 'pandas.core.computation.api', 'pandas.core.computation.eval', 'pandas.core.computation.engines', 'pandas.core.computation.align', 'pandas.core.computation.common', 'pandas.core.computation.expr', 'pandas.core.computation.ops', 'pandas.core.computation.scope', 'pandas.compat.chainmap', 'pandas.core.computation.parsing', 'pandas.core.reshape.api', 'pandas.core.reshape.merge', 'pandas.core.reshape.pivot', 'pandas.core.reshape.reshape', 'pandas.core.reshape.tile', 'pandas.api', 'pandas.api.extensions', 'pandas.api.indexers', 'pandas.api.types', 'pandas.core.dtypes.api', 'pandas.util._print_versions', 'pandas.io.api', 'pandas.io.clipboards', 'pandas.io.excel', 'pandas.io.excel._base', 'pandas._libs.parsers', 'pandas.io.excel._util', 'pandas.io.parsers', 'pandas.io.parsers.readers', 'pandas.io.parsers.base_parser', 'pandas.io.date_converters', 'pandas.io.parsers.c_parser_wrapper', 'pandas.io.parsers.python_parser', 'pandas.io.excel._odfreader', 'pandas.io.excel._openpyxl', 'pandas.io.excel._pyxlsb', 'pandas.io.excel._xlrd', 'pandas.io.excel._odswriter', 'pandas._libs.json', 'pandas.io.formats.excel', 'pandas.io.formats._color_data', 'pandas.io.formats.css', 'pandas.io.excel._xlsxwriter', 'pandas.io.excel._xlwt', 'pandas.io.feather_format', 'pandas.io.gbq', 'pandas.io.html', 'pandas.io.json', 'pandas.io.json._json', 'pandas.io.json._normalize', 'pandas.io.json._table_schema', 'pandas.io.orc', 'pandas.io.parquet', 'pandas.io.pickle', 'pandas.compat.pickle_compat', 'pandas.io.pytables', 'pandas.core.computation.pytables', 'pandas.io.sas', 'pandas.io.sas.sasreader', 'pandas.io.spss', 'pandas.io.sql', 'pandas.io.stata', 'pandas.io.xml', 'pandas.util._tester', 'pandas.testing', 'pandas._testing', 'pandas._testing._io', 'pandas._testing._random', 'pandas._testing.contexts', 'pandas._testing._warnings', 'pandas._testing.asserters', 'pandas._libs.testing', 'cmath', 'pandas._testing.compat', 'pandas._version', 'pyarrow.parquet', 'pyarrow._parquet', 'pyarrow.fs', 'pyarrow._fs', 'pyarrow._hdfs', 'pyarrow._s3fs', 'logging.config', 'logging.handlers', 'dill', 'dill.info', 'dill._dill', '_pyio', 'dill.settings', 'dill.source', 'dill.temp', 'dill.detect', 'dill.pointers', 'dill.objtypes', 'urllib3', 'urllib3.exceptions', 'urllib3.packages', 'urllib3.packages.ssl_match_hostname', 'urllib3.packages.six', 'urllib3.packages.six.moves', 'urllib3.packages.six.moves.http_client', 'urllib3._version', 'urllib3.connectionpool', 'urllib3.connection', 'urllib3.util', 'urllib3.util.connection', 'urllib3.contrib', 'urllib3.contrib._appengine_environ', 'urllib3.util.wait', 'urllib3.util.request', 'urllib3.util.response', 'urllib3.util.retry', 'urllib3.util.ssl_', 'urllib3.util.url', 'urllib3.util.ssltransport', 'urllib3.util.timeout', 'urllib3.util.proxy', 'urllib3._collections', 'urllib3.request', 'urllib3.filepost', 'urllib3.fields', 'urllib3.packages.six.moves.urllib', 'urllib3.packages.six.moves.urllib.parse', 'urllib3.response', 'urllib3.util.queue', 'urllib3.poolmanager', 'certifi', 'certifi.core', 'importlib.resources', 'tabulate', 'pip', 'setuptools', 'distutils', 'distutils.core', 'distutils.debug', 'distutils.errors', 'distutils.dist', 'distutils.fancy_getopt', 'getopt', 'gettext', 'distutils.util', 'distutils.dep_util', 'distutils.spawn', 'distutils.log', 'distutils.sysconfig', 'distutils.cmd', 'distutils.dir_util', 'distutils.file_util', 'distutils.archive_util', 'distutils.config', 'configparser', 'distutils.extension', 'distutils.filelist', 'setuptools._deprecation_warning', 'setuptools.extern', 'setuptools._vendor', 'setuptools._vendor.six', 'setuptools.extern.six', 'setuptools._vendor.six.moves', 'setuptools.extern.six.moves', 'setuptools.version', 'setuptools.extension', 'setuptools.monkey', 'setuptools.dist', 'distutils.version', 'setuptools._vendor.packaging', 'setuptools._vendor.packaging.__about__', 'setuptools.extern.packaging', 'setuptools._vendor.ordered_set', 'setuptools.extern.ordered_set', 'setuptools.windows_support', 'setuptools.config', 'setuptools.extern.packaging.version', 'setuptools.extern.packaging._structures', 'setuptools.extern.packaging.specifiers', 'setuptools.extern.packaging._compat', 'setuptools.depends', 'setuptools.py33compat', 'array', 'html.parser', '_markupbase', 'setuptools.py27compat', 'setuptools._imp', 'setuptools.py34compat', 'setuptools.msvc', 'distutils.ccompiler', 'lxml', 'lxml.etree', '_cython_0_29_22', 'lxml._elementpath', 'psycopg2', 'psycopg2.errors', 'psycopg2._psycopg', 'psycopg2.tz', 'psycopg2.extensions', 'psycopg2._json', 'psycopg2.compat', 'psycopg2._range', 'matplotlib', 'matplotlib.cbook', 'shlex', 'matplotlib.cbook.deprecation', 'matplotlib.rcsetup', 'matplotlib.animation', 'uuid', '_uuid', 'matplotlib._animation_data', 'matplotlib.fontconfig_pattern', 'pyparsing', 'pyparsing.util', 'pyparsing.exceptions', 'pyparsing.unicode', 'pyparsing.actions', 'pyparsing.core', 'pyparsing.results', 'pyparsing.helpers', 'pyparsing.testing', 'pyparsing.common', 'matplotlib.colors', 'matplotlib.docstring', 'matplotlib._color_data', 'cycler', 'matplotlib._version', 'matplotlib.ft2font', 'kiwisolver'] INSTALLED VERSIONS ------------------ commit : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5 python : 3.7.12.final.0 python-bits : 64 OS : Darwin OS-release : 21.1.0 Version : Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:23 PDT 2021; root:xnu-8019.41.5~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8 pandas : 1.3.4 numpy : 1.18.5 pytz : 2021.3 dateutil : 2.8.2 pip : 21.1.2 setuptools : 47.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : None pymysql : None psycopg2 : 2.8.6 (dt dec pq3 ext lo64) jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 3.0.0 pyxlsb : None s3fs : None scipy : 1.5.0 sqlalchemy : 1.3.17 tables : None tabulate : 0.8.5 xarray : None xlrd : None xlwt : None numba : 0.54.1
phofl commented 2 years ago

This is fixed on master. May need tests

MultiIndex([('X', '1'),
            ('X', '2'),
            ('X', '2')],
           )
Index of df2 has duplicates: True
Index of df2 is unique: False
Ganeshpadmanaban commented 2 years ago

Hi, @phofl can you please check if the fix works in versions 1.3.4 & 1.3.5? I get the same results as you in 1.3.3, but the bug persists in latter versions.

Also, I would like to pick it up as my first issue if it's okay

sukriti1 commented 2 years ago

take

simonjayhawkins commented 2 years ago

removing milestone