Closed Garet-Jax closed 5 years ago
@Garet-Jax : Thanks for reporting this! A little unusual to see pytest
go down in 🔥 like this, but before we go into that, let's start with this:
Can you execute import pandas
in a Python console on your instance?
Can you execute
import pandas
in a Python console on your instance?
Yes. Inside of the virtual environment, I start python using one of:
python
python3
python36
From there I can run both without issue:
import pandas
import pandas as pd
Thanks.
Can you run from pandas.tests.scalar.timestamp.test_timestamp import *
. That seems to be the file pytest is choking on.
And can you also try
import pandas.util.testing as tm
tm.get_locales()
Can you run
from pandas.tests.scalar.timestamp.test_timestamp import *
. That seems to be the file pytest is choking on.
(env) [ec2-user@ip-[IP_ADDRESS]~]$ python Python 3.6.7 (default, Dec 21 2018, 20:31:01) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux Type "help", "copyright", "credits" or "license" for more information.
from pandas.tests.scalar.timestamp.test_timestamp import * Traceback (most recent call last): File "
", line 1, in File "/home/ec2-user/env/local/lib64/python3.6/site-packages/pandas/tests/scalar/timestamp/test_timestamp.py", line 28, in class TestTimestampProperties(object): File "/home/ec2-user/env/local/lib64/python3.6/site-packages/pandas/tests/scalar/timestamp/test_timestamp.py", line 104, in TestTimestampProperties None] if tm.get_locales() is None else [None] + tm.get_locales()) File "/home/ec2-user/env/local/lib64/python3.6/site-packages/pandas/util/testing.py", line 516, in get_locales x, encoding=pd.options.display.encoding)) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte
And can you also try
import pandas.util.testing as tm tm.get_locales()
(env) [ec2-user@[IP_ADDRESS]~]$ python Python 3.6.7 (default, Dec 21 2018, 20:31:01) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux Type "help", "copyright", "credits" or "license" for more information.
import pandas.util.testing as tm tm.get_locales() Traceback (most recent call last): File "
", line 1, in File "/home/ec2-user/env/local/lib64/python3.6/site-packages/pandas/util/testing.py", line 516, in get_locales x, encoding=pd.options.display.encoding)) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte
Thanks a bunch for your help by the way.
I have been trying many different combinations and have found one that works.
I have tried a few different AWS AMIs:
Amazon Linux 2 AMI (HVM), SSD Volume Type - ami-01e3b8c3a51e88954 (64-bit x86) Amazon Linux AMI 2018.03.0 (HVM), SSD Volume Type - ami-0080e4c5bc078760e amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 - ami-0eee720fcca8e4499 Ubuntu Server 18.04 LTS (HVM), SSD Volume Type - ami-0ac019f4fcb7cb7e6 (64-bit x86)
The first 2 are the standard quick start AMI for launching an EC2 instance. One is Linux 2 and the other is Linux 1.
The 3rd one is really the one I want to use because it is the closest to the Linux used by the AWS Lambda infrastructure. I am trying to parlay this work into a Lambda function and want as much synergy as possible so there are no errors on the back end.
The 4th one is the standard quick start for Ubuntu version 18.04
I have also tried a few different AWS instance sizings:
t2.micro t2.small t2.medium
I used amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 - ami-0eee720fcca8e4499 with a t2.micro instance and there were no errors on the test. However the test hung the instance and so never completed. I tried the same AMI with larger instances, but they got the errors.
I tried an Ubuntu server this morning with a t2.medium instance. There were a couple of minor differences, but the pandas.test() worked. This seems to be a systemic problem brought on by Linux version and hardware differences.
I am really confused.
I'm not sure either. IIRC we had another user reporting issues with tm.get_locale()
, but I don't recall the specifics.
Interestingly enough there is a comment above the offending line about some locales not being decodable, though there isn't a catch for those either:
What is the output of locale -a
from the shell on the AMI? From what I can see documented in the FAQ the default encoding is UTF-8 so very surprised that is the codec that is failing here
What is the output of
locale -a
from the shell on the AMI?
We may have found the issue. On the Ubuntu machine (this one works):
(env) ubuntu@[IP_ADDRESS]:~$ locale -a C C.UTF-8 POSIX en_US.utf8
On the AMI that doesn't work (basically every locale that exists):
From what you've pasted I get the impression that the bokmål
encoding is the one to blame here. Interesting as I doubt that even renders in the AMI shell correctly given the bytes shown in the traceback map to that accented character in cp1252 (maybe some others) and not unicode:
>>> b'bokm\xe5l'.decode('utf8') # This simulates the error you get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 4: invalid continuation byte
>>> b'bokm\xe5l'.decode('cp1252')
'bokmål'
OK - here is what I did. I followed the following post to effectively remove all locales except for en_us.UTF8.
https://talk.maemo.org/showthread.php?t=94625
My exact commands were:
sudo localedef --list-archive | grep -E -v "en_US.utf8" | xargs sudo localedef --delete-from-archive
sudo mv /usr/lib/locale/locale-archive /usr/lib/locale/locale-archive.tmpl
sudo build-locale-archive
I then exited my SSH session and started a new one. I re-ran locale -a and got:
(env) [ec2-user@[IP-ADDRESS]~]$ locale -a C en_US.utf8 POSIX
I then started my virtual environment, ran a python session, imported pandas and ran pandas.test().
My test ran.
Yay!
Thanks a ton for everyone who took the time to provide a second set of eyes and suggestions. Couldn't have done this without you.
@Garet-Jax glad it worked. Something seems strange with the locales there but most likely an issue outside of pandas. Closing as I don't think there's anything to do here on our end
Thank you! I had same problem with CentOS7.6.1810 + anaconda3-2018.12. I tried sudo localedef --list-archive | grep -E -v "en_US.utf8" | xargs sudo localedef --delete-from-archive sudo mv /usr/lib/locale/locale-archive /usr/lib/locale/locale-archive.tmpl sudo build-locale-archive after that it resolved. Thanks a lot!
Very glad to hear it helped. When I researched further, there were a couple of locales that had very funky character sets (which is what caused the test scripts to fail). With that said, I took a bull in a china shop approach and deleted all but en_US. This worked for me. A better solution would have been to remove just the offending locales. Just be aware that my solution removed every locale.
On Wed, Mar 13, 2019 at 10:56 AM yojineko notifications@github.com wrote:
Thank you! I had same problem with CentOS7.6.1810 + anaconda3-2018.12. I tried sudo localedef --list-archive | grep -E -v "en_US.utf8" | xargs sudo localedef --delete-from-archive sudo mv /usr/lib/locale/locale-archive /usr/lib/locale/locale-archive.tmpl sudo build-locale-archive after that it resolved. Thanks a lot!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/24760#issuecomment-472457273, or mute the thread https://github.com/notifications/unsubscribe-auth/Asf9FMs1dB8Q8QjUWhYtzd5fu_-wASnKks5vWRGbgaJpZM4Z9gDU .
-- Anthony Beaty anthonybeaty@gmail.com
I am trying to install python and pandas on an AWS Linux EC2 instance. I have tried many versions of pandas and many ways to install it, but they all seem to have errors when I try to run the test() function.
I create a brand new EC2 instance and run:
1) sudo yum update 2) sudo yum install python36-devel python36-pip gcc gcc-c++ 3) virtualenv ~/env -p python36 && source ~/env/bin/activate
From there, I have tried to install pandas using: a) python3 -m pip install pandas==0.24.0rc1 b) python3 -m pip install --upgrade pandas==0.24.0rc1 c) pip install pandas==0.24.0rc1
I also install pytest. If I use 0.24.0rc1, then I also install hypothesis.
The errors I get are:
The output of show_versions() is: