slurpcode / slurp

One repo to rule them all !!?!?!! 🤓 😎
https://slurp.rtfd.io/
GNU General Public License v3.0
53 stars 73 forks source link

🔪 The Khan scrapers are broken 🔪 #205

Open jbampton opened 4 years ago

jbampton commented 4 years ago

Looks like all the khan scrapers are broken :(

mirelagrigoras commented 3 years ago

Hello @jbampton. Is this the script that we should fix: https://github.com/slurpcode/slurp/blob/master/scrapers/python/lxml/khan.py?

jbampton commented 3 years ago

Hi @mirelagrigoras !!

Yes that is the khan Python script that needs fixing.

But my Khan profile link is now -> https://www.khanacademy.org/profile/JohnBampton/

jbampton commented 3 years ago

I think the way the Khan pages are rendered has changed.

Seems they might be using dynamic JS to create the web pages.

jbampton commented 3 years ago

We need the Energy points earned data scraped.

Screen Shot 2020-08-30 at 11 26 51 pm
ajakov commented 3 years ago

Yes, the way pages are rendered has changed. Whoever wants to fix other scrapers can check out the PHP scrapper, it works with the new layout and data fetching.

jbampton commented 2 years ago

Done in Python now

wickedknock commented 1 year ago

does it still needs fixing , really want to do some web scraping , if so please add me

jbampton commented 1 year ago

Yes they do need fixing