steven-king / mj583

J583 Advanced Interactive Media
15 stars 11 forks source link

Class Plan for Wednesday, Jan 24 #5

Open steven-king opened 6 years ago

steven-king commented 6 years ago

To start class: Launch and login to Docker App.

Pull the image using

docker pull uncdataviz/class-03:start

Conceptual Lecture Understanding Scrapers Ethics of Scraping

Technical Lecture Writing a Scraper using Scrapy

To Start the container using

docker run -p 9000:9000 -d -t uncdataviz/class-03:start
steven-king commented 6 years ago

I have added my iPython notebook that has lots of comments. Note that I use multiple cell to show how it builds and changes but you don't have to duplicate all that work. This also covers what we are going to be doing on Monday.

steven-king commented 6 years ago

Assignment: scrape a roster from goheels.com and store the player name and href as a key value par list.

Do this in an iPython notebook. Remember to use docker commit to save the state and also export the notebook before killing the process in the terminal.

Just bring to class. You do not have to turn it in.

jtizon001 commented 6 years ago

I am having trouble making a connection to goheels.com. I keep getting 404 error responses yet if i curl it through terminal it works fine. I am wondering if there is an issue with how the server is handling requests. Anyone else having this problem?

aryaswanie commented 6 years ago

Yes I’ve been having the same problem! All other urls work fine but goheels.com never works.

Arya

On Jan 27, 2018, at 16:29, jtizon001 notifications@github.com wrote:

I am having trouble making a connection to goheels.com. I keep getting 404 error responses yet if i curl it through terminal it works fine. I am wondering if there is an issue with how the server is handling requests. Anyone else having this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

steven-king commented 6 years ago

I will look into this very late tonight can get back to the class.

Has anyone been successful?

Sent from my iPhone

On Jan 27, 2018, at 4:45 PM, aryaswanie notifications@github.com<mailto:notifications@github.com> wrote:

Yes I’ve been having the same problem! All other urls work fine but goheels.comhttp://goheels.com never works.

Arya

On Jan 27, 2018, at 16:29, jtizon001 notifications@github.com<mailto:notifications@github.com> wrote:

I am having trouble making a connection to goheels.comhttp://goheels.com. I keep getting 404 error responses yet if i curl it through terminal it works fine. I am wondering if there is an issue with how the server is handling requests. Anyone else having this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/steven-king/mj583/issues/5#issuecomment-361017715, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABo_GS2b1gWr9_xpDj33UtdLbIdBOKvyks5tO5j-gaJpZM4RrdQB.

elisabeth-parker commented 6 years ago

I'm having the same problem, 404 error from goheels.com but success from other sites.

steven-king commented 6 years ago

CHANGE of PLAN: It looks like our agent is being blocked. You do not need to do the assignment. Just study the Cody from Wednesday and make sure you understand what is going on in each cell.

Sorry, Steven

Sent from my iPhone

On Jan 28, 2018, at 10:09 AM, elisabeth-parker notifications@github.com<mailto:notifications@github.com> wrote:

I'm having the same problem, 404 error from goheels.comhttp://goheels.com but success from other sites.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/steven-king/mj583/issues/5#issuecomment-361069804, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABo_GdJa4yHiiVZgqS4gw21vEUhDwjCHks5tPI2TgaJpZM4RrdQB.