Info Request - Githubissues

Johnny-Courage020 commented 7 years ago

Mr Hunter,

I'm in need of a way to extract daily FUT player prices (and perhaps stats too) in order to build a financial time-series for my masters thesis on behavioural finance. Is there code in the futgui file that would enable me to run a daily query for these prices?

Any feedback would be very much appreciated.

Kind regards, and happy new year from amsterdam,

Jon

hunterjm commented 7 years ago

Sorry for the delayed response. I would use a site like futbin to get that information. You would need a lot of accounts to pull daily prices on multiple players due to EA's rate limiting. Sites like futbin take all the pain out of it.

It tracks buy it now prices for almost every player on the market. You can see a daily graph for a player if you search for them, and they even have multiple market indexes so you can see overall market trends

Johnny-Courage020 commented 7 years ago

Thanks a lot for your reply.

I'm indeed aware of futbin. The problem I ran into there is that crawlers are unable to consistently pull all the data (15,540 players) from the website. Now I wouldn't really need all 15,000 since 4,000 in the cross-section would do fine. However, crawlers are untargeted and will return a slightly different set of 4,000 players each time, which might lead to a problem in peer review and replication (also a lot of duplicates for some reason). All this might have something to do with DDOS prevention in the futbin website but I'm not sure.

An extractor would be much more targeted since I could just generate a list of player URLs in Excel and let the extractor browse them all. However (sigh), running that many queries a day through an extractor costs a shitload of money when using services like import.io.

I've tried Google sheets as well with code such as this: =if(istext(B4)=TRUE;index(importhtml(B4;"table";3);1;2);"")

but Google sheets can't handle pulling more than about 80 cells at a time from futbin - and 80 already stretches its resources thin.

Never thought it would be this freakin' hard getting data to analyse the FUT market. But then again, maybe that's why it hasn't been done before.

Anyway, thanks for letting me know the limitations of trying this with Python. Saves me a lot of time programming.

Warm regards,

Jon

hunterjm commented 7 years ago

Keep the developer tools network tab open next time you load a player profile on futbin. You may just find that getting the data you need is much easier than using a web crawler.

Johnny-Courage020 commented 7 years ago

My man, I just wanted to let you know that I managed to build something in Python that works, and to thank you again for your time. In case you're interested, the script is here below (it's my first ever Python programme, so be gentle). Do you happen to know for how many seconds I should set sleep(), to make sure I don't bomb futbin's servers?

Cheers!

import csv import requests from lxml import html import time

list_of_urls = [] for i in list(range(16000)): id = i+1 url = "https://www.futbin.com/17/player/{0}".format(id) list_of_urls.append(url)

identifiers = ['bin_1_ps4', 'bin_2_ps4', 'bin_3_ps4', 'bin_4_ps4', 'bin_5_ps4', 'bin_1_xbox', 'bin_2_xbox', 'bin_3_xbox', 'bin_4_xbox', 'bin_5_xbox', 'name', 'club', 'nation', 'league', 'skills', 'weak foot', 'foot', 'height', 'weight', 'revision', 'Def. Workrate', 'Att. Workrate', 'Added On']

list_of_prices = [] for url in list_of_urls: webpage = requests.get(url) html_tree = html.fromstring(webpage.content) prices = html_tree.xpath('//span[@class ="bin_text"]/text()')

webpage = requests.get(url)
html_tree = html.fromstring(webpage.content)
attributes = html_tree.xpath('//td[@class ="table-row-text"]/text()')

prices_attributes = prices + attributes

list_of_prices.append(prices_attributes)
time.sleep(0.5)

list_of_prices.insert(0, identifiers)

outfile = open("./player_6.csv", "w", newline='',) writer = csv.writer(outfile) writer.writerows(list_of_prices)

Johnny-Courage020 commented 7 years ago

Also, this is going to take approx 4,5 hours of runtime every day if I extract the data serially. I've tried finding ways to run multiple requests in paralell. All I can find is the Scrapy tool, which just seems to be a Python built web crawler, which is not really what I'm looking for. Do you have any ideas which library or function would help me speed up the process of getting all 16,000 players each day?

Thanks again!

hunterjm commented 7 years ago

@Johnny-Courage020 you can use an async requests counterpart to speed up getting the data. Look at grequests or requests-futures.

Also, instead of scraping the HTML, you may find that the data used to render the graphs has most of the pricing information you need. Example here. The first value is the unix timestamp in ms, the second is the price. The player information is static, so you do not need to build that in to your scraper.

Johnny-Courage020 commented 7 years ago

Awesome, grequests looks pretty straightforward so I'll look into to that. I tried the asyncio library earlier, but I got kinda stuck, and since I'm already familiar with the requests library I'm quite sure using grequests would be better.

And holy shit I didn't know you could get the daily fluctuation from the price graph. That opens so many new possibilities for analysis. Too many for my MSc thesis, so I guess I now know what my dissertation is gonna be about haha.

I have one more thing, and I'll leave you be. You undoubtedly have more important things to do than instructing people in the basics of Python.

The code below pulls out some of the player info that I'm using in addition to prices from the information table on the player page.

Cheers!

attributes = html_tree.xpath('//td[@class="table-row-text"]/text()')`

Problem is, it fails to grab the information that is contained in a hyperlink. So 'Club', 'Nation' and 'League' show up as empty strings. I've tried editing the xpath to be more specific, and tried to figure out if changing the /text() works, but no luck.

Johnny-Courage020 commented 7 years ago

Uhm 'Cheers!' should have been at the end of the message.

Not the first buggy script I've written though

Johnny-Courage020 commented 7 years ago

I've figured it out. Thanks!

hunterjm commented 7 years ago

Awesome. Sorry, this got lost somewhere in my email!

Johnny-Courage020 commented 7 years ago

No problem. You've helped me a ton, so thanks again

djcake commented 7 years ago

@Johnny-Courage020 I'm interested in getting more info about the thesis you are working on. Any chance to point out some link with more info?

Johnny-Courage020 commented 7 years ago

Hi djcake,

I'm sorry for the belated response. Right now I'm still in the process of gathering data. I've set up a cloud server which runs queries every night at 3:30 am. That way I'm building a panel data set on which I'll perform my analyses.

The main idea is to check whether goals scored in real life competitions affects prices in the FUT17 market. There is a fair amount of literature within Behavioural Finance which discusses 'Attention'. In other words, prices in asset markets are driven in part by investor attention, which exists seperately from any sort of financial fundamentals of the value of a firm's assets.

The FUT17 market could be a nice natural experiment because players have no fundamentals, so these fundamentals do not change as a function of goals that are scored in the real life competitions.

If you have any questions don't hesitate to ask. Also, I'll try and remind myself to link you my thesis once it's done and written.

Kind regards,

Jon

Johnny-Courage020 commented 7 years ago

@djcake forgot the tag

hunterjm / futgui

Info Request #173