pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.8k stars 17.98k forks source link

BUG: KeyError: 'Step Nr.s' #58502

Closed paivadanieljl closed 6 months ago

paivadanieljl commented 6 months ago

Pandas version checks

Reproducible Example

# Definindo uma função que cria o gráfico de dispersão
def Isabela(x, y, titulo):
  plt.figure(figsize = (10, 6))
  plt.scatter(x, y, color = 'blue', label = 'Dados') # Criando os gráficos de dispersão
  media_y = y.mean() # Calculando a média de y
  plt.axhline(media_y, color = 'red', linestyle = '--', label = 'Média') # Plotando a linha de média

  # Adiciondando título, rótulos aos eixos e legenda
  plt.title(f'Gráfico de {titulo} vs Tempo (fs)')
  plt.xlabel('Tempo (fs)')
  plt.ylabel(f'{titulo}')
  plt.grid(True)
  plt.legend(title = 'Legenda', fontsize = 'small', shadow = True)
  plt.show() # Mostrando o gráfico

# Importando as bibliotecas necessárias
import pandas as pd
import matplotlib.pyplot as plt

# Carregando o arquivo de dados
from google.colab import files
print('Selecione o arquivo de dados')
uploaded = files.upload()
dados = pd.read_csv('dados.dat') # Carregando os dados do arquivo

# Extraindo os dados do arquivo
n = dados['Step Nr.']
t = dados['Time[fs]']
K = dados['Kin.[a.u.]']
Temp = dados['Temp[K]']
Pot = dados['Pot.[a.u.]']

# Importando a função de dispersão
Isabela(t, Pot, 'Energia Potencial (u.a.)')
Isabela(t, Temp, 'Temperatura (K)')
Isabela(t, K, 'Energia Cinética (u.a.)')

Issue Description

The code is getting me an error which i cant fix. Every time i try to read my data on dados.dat the i get the following error:

KeyError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key) 3652 try: -> 3653 return self._engine.get_loc(casted_key) 3654 except KeyError as err:

4 frames pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Step Nr.'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key) 3653 return self._engine.get_loc(casted_key) 3654 except KeyError as err: -> 3655 raise KeyError(key) from err 3656 except TypeError: 3657 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'Step Nr.'

This is very strange because i just cant find whats wrong with it, and the only solution was changing the "dados" on n = dados['Step Nr.'] to "data", and it works only with the file on my girlfriend's laptop (which is exactly the same one that our teacher gave to us), anyone can help me with this strange behavior?

Expected Behavior

This should read the data just fine and print out the graphs for them, which, as i mentioned above, worked, but only on her laptop

Edit: We are using Google Collab with Python 3, just as our teacher asked

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0f437949513225922d851e9581723d82120684a6 python : 3.10.12.final.0 python-bits : 64 OS : Linux OS-release : 6.1.58+ Version : #1 SMP PREEMPT_DYNAMIC Sat Nov 18 15:31:17 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.0.3 numpy : 1.25.2 pytz : 2023.4 dateutil : 2.8.2 setuptools : 67.7.2 pip : 23.1.2 Cython : 3.0.10 pytest : 7.4.4 hypothesis : None sphinx : 5.0.2 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.4 html5lib : 1.1 pymysql : None psycopg2 : 2.9.9 jinja2 : 3.1.3 IPython : 7.34.0 pandas_datareader: 0.10.0 bs4 : 4.12.3 bottleneck : None brotli : None fastparquet : None fsspec : 2023.6.0 gcsfs : 2023.6.0 matplotlib : 3.7.1 numba : 0.58.1 numexpr : 2.10.0 odfpy : None openpyxl : 3.1.2 pandas_gbq : 0.19.2 pyarrow : 14.0.2 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.11.4 snappy : None sqlalchemy : 2.0.29 tables : 3.8.0 tabulate : 0.9.0 xarray : 2023.7.0 xlrd : 2.0.1 zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
Aloqeely commented 6 months ago

It's hard to figure out what the problem is since we don't have access to dados.dat. Could you provide a minimal bug report? (see https://matthewrocklin.com/minimal-bug-reports)

paivadanieljl commented 6 months ago

It's hard to figure out what the problem is since we don't have access to dados.dat. Could you provide a minimal bug report? (see https://matthewrocklin.com/minimal-bug-reports)

Here is the link to download the file, from google drive, im not sure if the problem is with it though.

https://drive.google.com/file/d/1IujtTu9eUEOPpId9bfgMzGfVhRz0ZrUr/view?usp=sharing

Aloqeely commented 6 months ago

I am sorry, maybe you misunderstood me but the file and code are too large, can you try to make a new file that still makes the same error but with less lines of code?

rhshadrach commented 6 months ago

anyone can help me with this strange behavior?

This is the issue tracker for pandas. You should only be posting reports here if you believe there is an issue with pandas. If we were to allow anyone to ask for help with using pandas here, we would be overwhelmed with "issues", and that would interfer with the fixes, enhancements, and maintenance the contributors are trying to provide.

I would recommend posting your question to StackOverflow. Closing for now.

paivadanieljl commented 6 months ago

Sorry for the mistake, didnt know, anyway i fixed it, i didnt see that the data had one comment line, so thats it