Closed DerBoesePazifist closed 6 years ago
@DerBoesePazifist ,
I was messing around with the code some and saw this, as well. I've modified the code to just generate a csv so far (so you don't need MySQL) and addressed the issue by simply adding the lists:
player + attribute
So, I think you could do something like:
'''.format(value), (player + attribute))
While I'm not using MySQL, that was the only issue I ran into. For Fifa18 only, it scraped about 9790 rows/players in ~300 sec.
Feel free to let me know if you want to see any of the code I wrote
@DerBoesePazifist,
I believe you're using Python 2 instead of Python 3.
@kafagy You were right. sudo python ./fifa.py
uses python 2.
Now i ran it with sudo python3 ./fifa.py
and i get a new error:
derboesepazifist@Laptop:~/Dokumente/Downloads/fifa-FUT-Data-master$ sudo python3 ./fifa.py [sudo] Passwort für derboesepazifist: /usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 return f(*args, **kwds) 1.334735 Traceback (most recent call last): File "./fifa.py", line 13, in <module> connection = pymysql.connect(user='root', password='abc123', host='127.0.0.1', db='FUTHEAD', cursorclass=pymysql.cursors.DictCursor, charset='UTF8') File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/__init__.py", line 94, in Connect return Connection(*args, **kwargs) File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 327, in __init__ self.connect() File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 598, in connect self._request_authentication() File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 852, in _request_authentication auth_packet = self._read_packet() File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 683, in _read_packet packet.check_error() File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error err.raise_mysql_exception(self._data) File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception raise errorclass(errno, errval) pymysql.err.InternalError: (1698, "Access denied for user 'root'@'localhost'")
I dont need the SQL-output, but the code generates the csv from a database (as far as i understand it). Do I have to create the SQL-"table" manually? I thought the code does this for me...
Now i tried to run the script in sudo su
mode. It told me, that I dont have "pandas" installed, so I installed pandas with pip3 install pandas
. Same thing with the other libraries. But I get the same error-message.
I tried this to fix it (changed in mysql the plugin of root from auth_socket
to mysql_native_password
), but the message just changed:
derboesepazifist@Laptop:~/Dokumente/Downloads/fifa-FUT-Data-master$ sudo python3 ./fifa.py /usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 return f(*args, **kwds) 0.797207 Traceback (most recent call last): File "./fifa.py", line 13, in <module> connection = pymysql.connect(user='root', password='abc123', host='localhost', db='FUTHEAD', cursorclass=pymysql.cursors.DictCursor, charset='UTF8') File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/__init__.py", line 94, in Connect return Connection(*args, **kwargs) File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 327, in __init__ self.connect() File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 598, in connect self._request_authentication() File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 862, in _request_authentication auth_packet = self._process_auth(plugin_name, auth_packet) File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 933, in _process_auth pkt = self._read_packet() File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 683, in _read_packet packet.check_error() File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error err.raise_mysql_exception(self._data) File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception raise errorclass(errno, errval) pymysql.err.OperationalError: (1045, "Access denied for user 'root'@'localhost' (using password: YES)")
What am I doing wrong? Thank you for your help!
@DerBoesePazifist
DROP DATABASE IF EXISTS DBName
(where DBName is the name of the database)If you don't need it in a database, you can remove the connection to mysql and then just right the rows to a file. I didn't want to install MySQL, so I just did that. I was messing with it for a little bit. I think this should give you a csv file if that's all you're after:
import re
import time
import urllib
import requests
import pandas as pd
from bs4 import BeautifulSoup
import sqlite3
from datetime import datetime
from enum import Enum
# Runtime start
start = time.clock()
print(start)
# Sending request to futhead.com
FutHead = requests.get('http://www.futhead.com/18/players')
# Parsing the number of pages for fifa 18 players
bs = BeautifulSoup(FutHead.text, 'html.parser')
TotalPages = int(re.sub('\s +', '', str(bs.find('span', {'class': 'font-12 font-bold margin-l-r-10'}).get_text())).split(' ')[1])
print('Number of pages to be parsed: ' + str(TotalPages))
fifa = {
# '10': 'FIFA10',
# '11': 'FIFA11',
# '12': 'FIFA12',
# '13': 'FIFA13',
# '14': 'FIFA14',
# '15': 'FIFA15',
# '16': 'FIFA16',
# '17': 'FIFA17',
'18': 'FIFA18'
}
# with connection.cursor() as cursor:
for key, value in fifa.items():
print('Doing Fifa ' + key)
# Truncating table before inserting data into the table
# cursor.execute('TRUNCATE TABLE FUTHEAD.{};'.format(value))
# List Intializations
players = []
attributes = []
# Looping through all pages to retrieve players stats and information
for page in range(1, TotalPages + 1):
base_url = 'http://www.futhead.com/' + key + '/players'
FutHead = requests.get(base_url + '/?page=' + str(page) + '&bin_platform=ps')
bs = BeautifulSoup(FutHead.text, 'html.parser')
Stats = bs.findAll('span', {'class': 'player-stat stream-col-60 hidden-md hidden-sm'})
Names = bs.findAll('span', {'class': 'player-name'})
Information = bs.findAll('span', {'class': 'player-club-league-name'})
Ratings = bs.findAll('span', {'class': re.compile('revision-gradient shadowed font-12')})
MainBlock = bs.findAll('a', {'class': 'display-block padding-0'})
# Calcualting the number of players per page
num = len(bs.findAll('li', {'class': 'list-group-item list-group-table-row player-group-item dark-hover'}))
# Parsing all players information
for i in range(0, num):
p = []
relative_url = MainBlock[i].attrs['href']
player_url = urllib.parse.urljoin(base_url, relative_url)
p.append(player_url)
p.append(Names[i].get_text())
strong = Information[i].strong.extract()
try:
p.append(re.sub('\s +', '', str(Information[i].get_text())).split('| ')[1])
except IndexError:
p.append((''))
try:
p.append(re.sub('\s +', '', str(Information[i].get_text())).split('| ')[2])
except IndexError:
p.append((''))
p.append(strong.get_text())
p.append(Ratings[i].get_text())
players.append(p)
# Parsing all players stats
a = []
for stat in Stats:
if Stats.index(stat) % 6 == 0:
if len(a) > 0:
attributes.append(a)
a = []
if stat.find('span', {'class': 'value'}) is None:
pass
else:
a.append(stat.find('span', {'class': 'value'}).get_text())
print('page ' + str(page) + ' is done!')
column_headings = (
'URL',
'NAME',
'CLUB',
'LEAGUE',
'POSITION',
'RATING',
'PACE',
'SHOOTING',
'PASSING',
'DRIBBLING',
'DEFENDING',
'PHYSICAL',
'LOADDATE',
)
rows = []
rows.append(column_headings)
# Inserting data into its specific table
for player, attribute in zip(players, attributes):
row = player + attribute + [str(datetime.now().now())]
rows.append(row)
# CSV
filename = '{0}.csv'.format(value)
with open(filename, 'w+') as f:
for row in rows:
f.write(','.join(row))
f.write('\n')
# Runtime end
print(time.clock() - start)
Hi,
I just found your amazing script and i wanted to try it. But when i execute "python fifa.py" on my Ubuntu 18.04.1 there is this error message:
I updated python and the needed libraries.
Many greetings