Open nirbarazida opened 4 years ago
The solution is actually to encode the characters at the point of creating the engine as shown in the picture below
code that was added : '?charset=utf8mb4'
final code:
engine = create_engine(f"{config.SQL_EXTENSION}+{config.PYTHON_DBAPI}://{config.USER_NAME}:"
f"{config.PASSWORD}@localhost/{config.DB_NAME}?charset=utf8mb4")
Although I succeeded one time to insert a user with a special string that requires UTS-8mb4 to the database - it didn't occur again.
good: bad:
I have tried multi disciplines to solve the problem, however none have worked. From what I've searched, this is a known problem. I've decided to solve the problem by encoding to UTF-8 all string that Stack Exchange users can freely write.
all changes have been made in user class. all new variables have been created in the json file and config class.
for some users, all characters require encoding - Thus getting AttributeError. to solve this problem used Try and except. because the user name isn't necessary for the data Analysis we will change it to the website name + user rank
After conducting an experiment on scraping 500 users the code work fine.
location created an issue when queering the database to check if location string excesses in Stack_Exchange_Location Table. Thus, needed to encode the location string right after scraping it.
This time the program crashed because of using different types of letters. Added another 'Bandage' just to keep the server running by checking if the string is printable:
we must figure how to encode utf-8mb4 on the AWS server. I'm am sure that the solution below will work:
green - the solution - encoding the database when creating it. yellow - insurance - encoding specific tables and columns.
I think that you've changed the MySQL config to latin1 thus it doesn't work.
when running the code on a local server our Char encoding default is UTF-8mb4 (that's 4 bytes) The AWS Char encoding default is UTF-8 (that's 3 bytes) as shown in the picture. In this case, special characters (Chinese, Russian, figures, etc.) could not be encoded Thus the program crushes. To solve the issue we can choose 2 options: