tilburgsciencehub / music-to-scrape

A fictitious music streaming service with a real website and API so you can learn how to scrape!
https://music-to-scrape.org
3 stars 6 forks source link

audit scraping tutorial (and audit the HTML) #23

Open hannesdatta opened 1 year ago

hannesdatta commented 1 year ago

Background:

We've built our site so others can learn how to scrape. But, we've never actually tried scraping it ourselves!

The purpose of this task is to build a "scraping tutorial" for the site, BUT ALSO revise our HTML templates to make the site "scraping-friendly".

We need to ensure that we cover a range of "identifiers" to get data from the site. This should be

Further, we need to ensure students can extract information (1) from the TEXT attributes of HTML, (2) as well as from attribute-values.

Deliverable:

Next steps:

fleurlemire commented 1 year ago

Hi Hannes,

This is what i included so far, i can not add it in here since it is a jupyter notebook. I will send the right version via email (since it looks like images are not working well in the colab), but will add a google colab in here two: https://colab.research.google.com/drive/1F64Po-c3weJAm_ZrABAQRBJzXS-Qj5y4?usp=sharing

I did not include the recently played or top 10 songs for users and artist since i only can scrape the table as a whole but i have code for that if we want to include it later. I did not include song information yet, since that page caused an error. I can try some things and add that if we want since i guess that one is a little more difficult. I also did not add code to save it as pd dataframe yet. I can include that if we want to.

I can also remove things if some things are already too extensive.

hannesdatta commented 1 year ago

Hi @fleurlemire - please commit your work directly on our github repository for this project. You can create a new folder (say: tutorials) as a root directory. Let me know please.

fleurlemire commented 1 year ago

Hi @hannesdatta, when i try to, i get an error message saying permission denied when I try to commit.

hannesdatta commented 1 year ago

You should now have push access. Can you try again?

fleurlemire commented 1 year ago

It is added! @hannesdatta

fleurlemire commented 1 year ago

Hi Hannes, i added some extra information, including how to save the information and uploaded it.