Open hannesdatta opened 1 year ago
Hi Hannes,
This is what i included so far, i can not add it in here since it is a jupyter notebook. I will send the right version via email (since it looks like images are not working well in the colab), but will add a google colab in here two: https://colab.research.google.com/drive/1F64Po-c3weJAm_ZrABAQRBJzXS-Qj5y4?usp=sharing
I did not include the recently played or top 10 songs for users and artist since i only can scrape the table as a whole but i have code for that if we want to include it later. I did not include song information yet, since that page caused an error. I can try some things and add that if we want since i guess that one is a little more difficult. I also did not add code to save it as pd dataframe yet. I can include that if we want to.
I can also remove things if some things are already too extensive.
Hi @fleurlemire - please commit your work directly on our github repository for this project. You can create a new folder (say: tutorials) as a root directory. Let me know please.
Hi @hannesdatta, when i try to, i get an error message saying permission denied when I try to commit.
You should now have push access. Can you try again?
It is added! @hannesdatta
Hi Hannes, i added some extra information, including how to save the information and uploaded it.
Background:
We've built our site so others can learn how to scrape. But, we've never actually tried scraping it ourselves!
The purpose of this task is to build a "scraping tutorial" for the site, BUT ALSO revise our HTML templates to make the site "scraping-friendly".
We need to ensure that we cover a range of "identifiers" to get data from the site. This should be
Further, we need to ensure students can extract information (1) from the TEXT attributes of HTML, (2) as well as from attribute-values.
Deliverable:
BeautifulSoup
. As an example, see this tutorial.Next steps: