tobiasBora / scribd-downloader-3

A small python script that downloads PDF from a scribd url.
GNU General Public License v3.0
40 stars 6 forks source link

/!\ Important: The script is not working anymore. I don't have time to maintain this script, and scribd's interface changes quite often and I can't update the script every month. I saw that now some alternative exists, so I don't think I'll continue to maintain it, but if someone wants to send some PR, feel free! The current code is not complicated to modify ;)

This script is a very short python script whose aim is to download scribd document into a PDF file.

** Installation

Depending on what you need, you have several ways to install this script. Or you can use the online docker image (slower, but you are sure to have the good firefox/selenium/geckodriver version), or you can just install the python deps yourself and run it.

*** Method 1: classic install with pip

To use this script you first need to make sure that =firefox=, =python3= and the python libraries =selenium= and =fpdf= are installed. Note that it may be better to setup all of these library inside a =virtualenv= to avoid version clash.

On debian-like systems, you can proceed using something like this: : sudo apt install firefox python3 python3-pip : sudo pip3 install selenium : sudo pip3 install fpdf : sudo pip3 install Pillow

And if you prefer the =virtualenv= version (make sure you are in an empty folder): : sudo apt install firefox python3 python3-pip : pip3 install --user virtualenv : pip3 install --upgrade virtualenv : virtualenv -p python3 venv : source venv/bin/activate : pip3 install selenium : pip3 install fpdf : pip3 install Pillow

Then, download this script : : git clone https://github.com/tobiasBora/scribd-downloader-3.git

Make sure it's executable : chmod +x scribd_downloader_3.py

as well as the last driver geckodriver available at https://github.com/mozilla/geckodriver/releases (make sure it's the version corresponding to your hardware): : wget https://github.com/mozilla/geckodriver/releases/download/v0.19.0/geckodriver-v0.19.0-linux64.tar.gz : tar zxvf geckodriver-v0.19.0-linux64.tar.gz

Great, you can now use the script !

*** Method 2: docker (only on 64bit systems)

First, make sure you have a recent [[https://docs.docker.com/install/][docker installed]] (the docker in the repository are outdated most of the time). Then, just run: : sudo docker run -it --shm-size 2g -v $(pwd):/host -w /host tobiasbora/scribd-downloader:18.01 bash

It will download the online docker image, mount your local folder in =/host=, and run a bash in this folder (the =--shm-size= is very important if you don't want firefox to crash). Then, you can simply run this command (don't forget the =xvfb-run=): : xvfb-run scribd_downloader_3.py

Ex: : xvfb-run scribd_downloader_3.py "https://www.scribd.com/doc/63942746/chopin-nocturne-n-20-partition" out.pdf

** Features

To use the script it's pretty easy. Here is the general usage:

: ./scribd_downloader_3.py -p

(NB : if =geckodriver= is in your global path, you can remove the flag =p=)

For example if =geckodriver= is in the same folder as the script, you can run something like: : ./scribd_downloader_3.py -p . "https://www.scribd.com/document/31698781/Constitution-of-the-Mexican-Mafia-in-Texas" out.pdf

If you are on a headless server, or if the firefox window that is open bother you, you may want to install =xvfb= and then run the same command prefixed with =xvfb-run=: : xvfb-run ./scribd_downloader_3.py -p the script will now run blindly !

** FAQ *** It's a bit slow/quick, can I increase/decrease the speed?

By default, this script waits 1s between each "screenshot" to be sure the page is loaded. If you have a slow connection, this may be too short, creating some blank pages, and if you have a good connection this may be to big, and you may want to decrease this time to download the page quicker.

For example if you want to wait 0.2 seconds (I even tried 0s on a 77 pages document and it worked without any trouble) between each screenshot, just use the option "-w 0.2":

Example: : ./scribd_downloader_3.py -p . https://www.scribd.com/document/359303091/A-Repository-of-Unix-History-and-Evolution unix_history.pdf -w 0.2

** Troubleshootings

I spent only a few hours to make this script, so it's not supposed to be perfect, but for what I tried it works pretty nicely. However, because scribd is changing often, it may not be working, so if you have any trouble, please [[https://github.com/tobiasBora/scribd-downloader-3/issues][fill an issue here]] or edit the code (it should not be too complicated, the code is pretty straight forward and simple) and do a pull request.