Open moghya opened 7 years ago
Such as?
@cLupus Thanks for showing interest in this project. I hope you visited http://moghya.me/allitebooks and got what we're trying to do here.
You can go through http://bookboon.com and try to wrtite scraper for it.
I'll add many such websites soon. Let me know if you gonna do it. I'll assign this to you :)
I got to take a look, on the site, as well as in your repo.
Am I correct in understanding that this issue is concerned with creating a scraper that creates a file similar to data.py
?
Yes, you're correct. It's just we dump the dictionary in JSON and process that JSON.
That does sound interesting. I assume the description should be in english. However, the site does offer some additional languiages, although not all the descriptions have been translated into the different languages. Is there any plan for localization (or at the very least to grab what's there in different languages)?
Honestly I didn't think of it. But as you have rightly raised we have to think about it ? What do you propose ?
On closer inspection, it seems that only the site have been translated, and not the titles, or the descriptions, and such it would seem not to add much value (in the first run, anyway).
let's work it for English and we'll come up with solution in near future
Another issue, is that http://bookboon.com 'locks' their books behind a dropdown, and do not offer direct links to their books. There are some ways to aliviate it
downloading zip one option but, maybe intercepting the request which downloads the book will solve our problem. Think it this way: scraper won't follow what bookboon, it'll work a step ahead we can workaround and get to know what exactly happens after filling the details and instead of filling the details we can directly send the request to download pdf.
Hi there, ladies and gentlemen. What's the status on this issue? @moghya Mind if I hop in? Also, shouldn't the first page be a bit more descriptive? I.E. A huge majority of web pages should have written somewhere in the homepage what it is and what it does, not down in the code.
Let me know what you think!
@EmilLuta maybe you can contribute by working on #3.
Following websites can be scraped