kafagy / fifa-FUT-Data

Web-scraping script that writes the data of all players from FutHead and FutBin to a CSV file or a DB
MIT License
76 stars 17 forks source link

SyntaxError when running #4

Closed DerBoesePazifist closed 6 years ago

DerBoesePazifist commented 6 years ago

Hi,

I just found your amazing script and i wanted to try it. But when i execute "python fifa.py" on my Ubuntu 18.04.1 there is this error message:

File "fifa.py", line 104
    '''.format(value), (*player, *attribute))
                        ^
SyntaxError: invalid syntax

I updated python and the needed libraries.

Many greetings

tkue commented 6 years ago

@DerBoesePazifist ,

I was messing around with the code some and saw this, as well. I've modified the code to just generate a csv so far (so you don't need MySQL) and addressed the issue by simply adding the lists:

player + attribute

So, I think you could do something like:

'''.format(value), (player + attribute))

While I'm not using MySQL, that was the only issue I ran into. For Fifa18 only, it scraped about 9790 rows/players in ~300 sec.

Feel free to let me know if you want to see any of the code I wrote

kafagy commented 6 years ago

@DerBoesePazifist,

I believe you're using Python 2 instead of Python 3.

image

DerBoesePazifist commented 6 years ago

@kafagy You were right. sudo python ./fifa.py uses python 2. Now i ran it with sudo python3 ./fifa.py and i get a new error:

derboesepazifist@Laptop:~/Dokumente/Downloads/fifa-FUT-Data-master$ sudo python3 ./fifa.py 
[sudo] Passwort für derboesepazifist: 
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
1.334735
Traceback (most recent call last):
  File "./fifa.py", line 13, in <module>
    connection = pymysql.connect(user='root', password='abc123', host='127.0.0.1', db='FUTHEAD', cursorclass=pymysql.cursors.DictCursor, charset='UTF8')
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/__init__.py", line 94, in Connect
    return Connection(*args, **kwargs)
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 327, in __init__
    self.connect()
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 598, in connect
    self._request_authentication()
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 852, in _request_authentication
    auth_packet = self._read_packet()
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 683, in _read_packet
    packet.check_error()
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.InternalError: (1698, "Access denied for user 'root'@'localhost'")

I dont need the SQL-output, but the code generates the csv from a database (as far as i understand it). Do I have to create the SQL-"table" manually? I thought the code does this for me... Now i tried to run the script in sudo su mode. It told me, that I dont have "pandas" installed, so I installed pandas with pip3 install pandas. Same thing with the other libraries. But I get the same error-message. I tried this to fix it (changed in mysql the plugin of root from auth_socket to mysql_native_password), but the message just changed:

derboesepazifist@Laptop:~/Dokumente/Downloads/fifa-FUT-Data-master$ sudo python3 ./fifa.py 
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
0.797207
Traceback (most recent call last):
  File "./fifa.py", line 13, in <module>
    connection = pymysql.connect(user='root', password='abc123', host='localhost', db='FUTHEAD', cursorclass=pymysql.cursors.DictCursor, charset='UTF8')
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/__init__.py", line 94, in Connect
    return Connection(*args, **kwargs)
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 327, in __init__
    self.connect()
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 598, in connect
    self._request_authentication()
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 862, in _request_authentication
    auth_packet = self._process_auth(plugin_name, auth_packet)
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 933, in _process_auth
    pkt = self._read_packet()
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/connections.py", line 683, in _read_packet
    packet.check_error()
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error
    err.raise_mysql_exception(self._data)
  File "/home/derboesepazifist/.local/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.OperationalError: (1045, "Access denied for user 'root'@'localhost' (using password: YES)")

What am I doing wrong? Thank you for your help!

tkue commented 6 years ago

@DerBoesePazifist

If you don't need it in a database, you can remove the connection to mysql and then just right the rows to a file. I didn't want to install MySQL, so I just did that. I was messing with it for a little bit. I think this should give you a csv file if that's all you're after:

import re
import time
import urllib

import requests
import pandas as pd
from bs4 import BeautifulSoup
import sqlite3
from datetime import datetime
from enum import Enum

# Runtime start
start = time.clock()
print(start)

# Sending request to futhead.com
FutHead = requests.get('http://www.futhead.com/18/players')

# Parsing the number of pages for fifa 18 players
bs = BeautifulSoup(FutHead.text, 'html.parser')
TotalPages = int(re.sub('\s +', '', str(bs.find('span', {'class': 'font-12 font-bold margin-l-r-10'}).get_text())).split(' ')[1])
print('Number of pages to be parsed: ' + str(TotalPages))

fifa = {
    # '10': 'FIFA10',
    # '11': 'FIFA11',
    # '12': 'FIFA12',
    # '13': 'FIFA13',
    # '14': 'FIFA14',
    # '15': 'FIFA15',
    # '16': 'FIFA16',
    # '17': 'FIFA17',
    '18': 'FIFA18'
}

# with connection.cursor() as cursor:
for key, value in fifa.items():
    print('Doing Fifa ' + key)

    # Truncating table before inserting data into the table
    # cursor.execute('TRUNCATE TABLE FUTHEAD.{};'.format(value))

    # List Intializations
    players = []
    attributes = []

    # Looping through all pages to retrieve players stats and information
    for page in range(1, TotalPages + 1):
        base_url = 'http://www.futhead.com/' + key + '/players'
        FutHead = requests.get(base_url + '/?page=' + str(page) + '&bin_platform=ps')
        bs = BeautifulSoup(FutHead.text, 'html.parser')
        Stats = bs.findAll('span', {'class': 'player-stat stream-col-60 hidden-md hidden-sm'})
        Names = bs.findAll('span', {'class': 'player-name'})
        Information = bs.findAll('span', {'class': 'player-club-league-name'})
        Ratings = bs.findAll('span', {'class': re.compile('revision-gradient shadowed font-12')})
        MainBlock = bs.findAll('a', {'class': 'display-block padding-0'})

        # Calcualting the number of players per page
        num = len(bs.findAll('li', {'class': 'list-group-item list-group-table-row player-group-item dark-hover'}))

        # Parsing all players information
        for i in range(0, num):
            p = []

            relative_url = MainBlock[i].attrs['href']
            player_url = urllib.parse.urljoin(base_url, relative_url)
            p.append(player_url)

            p.append(Names[i].get_text())
            strong = Information[i].strong.extract()
            try:
                p.append(re.sub('\s +', '', str(Information[i].get_text())).split('| ')[1])
            except IndexError:
                p.append((''))
            try:
                p.append(re.sub('\s +', '', str(Information[i].get_text())).split('| ')[2])
            except IndexError:
                p.append((''))
            p.append(strong.get_text())
            p.append(Ratings[i].get_text())
            players.append(p)

        # Parsing all players stats
        a = []
        for stat in Stats:
            if Stats.index(stat) % 6 == 0:
                if len(a) > 0:
                    attributes.append(a)
                a = []
            if stat.find('span', {'class': 'value'}) is None:
                pass
            else:
                a.append(stat.find('span', {'class': 'value'}).get_text())
        print('page ' + str(page) + ' is done!')

    column_headings = (
        'URL',
        'NAME',
        'CLUB',
        'LEAGUE',
        'POSITION',
        'RATING',
        'PACE',
        'SHOOTING',
        'PASSING',
        'DRIBBLING',
        'DEFENDING',
        'PHYSICAL',
        'LOADDATE',
    )

    rows = []
    rows.append(column_headings)

    # Inserting data into its specific table
    for player, attribute in zip(players, attributes):
        row = player + attribute + [str(datetime.now().now())]
        rows.append(row)

    # CSV

    filename = '{0}.csv'.format(value)

    with open(filename, 'w+') as f:
        for row in rows:
            f.write(','.join(row))
            f.write('\n')

# Runtime end
print(time.clock() - start)