tobiasBora / scribd-downloader-3

A small python script that downloads PDF from a scribd url.
GNU General Public License v3.0
40 stars 6 forks source link

Geckodriver is not found. #2

Open nobicycle opened 6 years ago

nobicycle commented 6 years ago

Thanks for the script.

I tried many variations of trying to get the driver found by the script.```

./scribd_downloader_3.py -p . "https://www.scribd.com/document/31698781/Constitution-of-the-Mexican-Mafia-in-Texas" out.pdf Scraping url: https://www.scribd.com/document/31698781/Constitution-of-the-Mexican-Mafia-in-Texas Output: out.pdf I will start the scraping... Will load the webdriver for firefox... /!\ ERROR /!\ Message: Unable to find a matching set of capabilities

The script cannot find the executable 'geckodriver'. Please, download it from the page: https://github.com/mozilla/geckodriver/releases Extract it, and run again this script, by adding the path of extraction via the tag '-p'. E.g: ./scribd_downloader_3.py -p geckodriver-v0.19.0-linux64 "https://www.scribd.com/document/31698781/Constitution-of-the-Mexican-Mafia-in-Texas" out.pdf



>  $ ls -l
> ]total 9356
> -rw-r--r-- 1 me me    1320 Feb  1 23:16 Dockerfile
> -rwxrwxr-x 1 me me 7194178 Nov  1 03:15 geckodriver
> -rw-r--r-- 1 me me    1430 Feb  1 23:27 geckodriver.log
> drwx------ 2 me me    4096 Feb  1 23:26 geckodriver-v0.19.1-linux64
> -rw-r--r-- 1 me me 2301226 Nov  1 03:15 geckodriver-v0.19.1-linux64.tar.gz
> -rw-r--r-- 1 me me   35149 Feb  1 23:16 LICENSE
> -rw-r--r-- 1 me me    4483 Feb  1 23:16 README.org
> -rw-r--r-- 1 me me      21 Feb  1 23:16 requirements.txt
> -rwxr-xr-x 1 me me   10738 Feb  1 23:16 scribd_downloader_3.py
> drwxr-xr-x 2 me me    4096 Feb  1 23:27 scribd_downloader_tmp
> 

The sub directory contains  geckodriver executable.
tobiasBora commented 6 years ago

Thank you for the bug report.

Despite the error name, in fact the problem is not that he can't find the geckodriver file (sorry, when I coded it I thought that it was the major problem that people would have, but I need to change that to be more precise). The real error is " Message: Unable to find a matching set of capabilities". Which means, basically, that there is a problem between versions. So one way would be to make sure that you have a 64bits system (you downloaded geckodriver 64bits), make sure that you have latest selenium version, and a Firefox newer than 53...

In order to simplify this "find the good version game", I tried to write a script that basically set up a local virtual environment, install latest python libs, download latest firefox & geckodriver binary, set up the path temporary so that it won't touch the main system... Could you try it please and tell me if it is working for you?

mkdir scribd_downloader_packaged && cd scribd_downloader_packaged
wget https://gist.githubusercontent.com/tobiasBora/20560a360fc9fc0512f6084a39edb377/raw/651833fe0b9d2e49b80ce00a063958670888a67e/set_up_local_scribd_download.sh
bash set_up_local_scribd_download.sh
bash set_up_local_scribd_download.sh # I think running it once should be enough but I did not test it
scribd_downloader_3.py https://www.scribd.com/doc/63942746/chopin-nocturne-n-20-partition chopin.pdf

Hope it will work, meanwhile I'll try to find a way to package it in a more straightforward manner (AppImage, nix...). Thank you!

nobicycle commented 6 years ago

Thanks for the new script. I still get the same error message (The script cannot find the executable 'geckodriver'. ) Here is the output:


[me@linux Scribd]$ mkdir scribd_downloader_packaged && cd scribd_downloader_packaged
[me@linux scribd_downloader_packaged]$ wget https://gist.githubusercontent.com/tobiasBora/20560a360fc9fc0512f6084a39edb377/raw/651833fe0b9d2e49b80ce00a063958670888a67e/set_up_local_scribd_download.sh
--2018-02-11 23:56:18--  https://gist.githubusercontent.com/tobiasBora/20560a360fc9fc0512f6084a39edb377/raw/651833fe0b9d2e49b80ce00a063958670888a67e/set_up_local_scribd_download.sh
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3279 (3.2K) [text/plain]
Saving to: ‘set_up_local_scribd_download.sh’

set_up_local_scribd 100%[===================>]   3.20K  --.-KB/s    in 0.02s   

2018-02-11 23:56:22 (162 KB/s) - ‘set_up_local_scribd_download.sh’ saved [3279/3279]

[me@linux scribd_downloader_packaged]$ bash set_up_local_scribd_download.sh
Requirement already satisfied: virtualenv in /usr/lib/python3.6/site-packages
Requirement already up-to-date: virtualenv in /usr/lib/python3.6/site-packages
Running virtualenv with interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/archives/Internet/Scraping/Scribd/scribd_downloader_packaged/download_everything/venv/bin/python3
Also creating executable in /home/archives/Internet/Scraping/Scribd/scribd_downloader_packaged/download_everything/venv/bin/python
Installing setuptools, pip, wheel...done.
Collecting selenium
  Downloading selenium-3.9.0-py2.py3-none-any.whl (942kB)
    100% |████████████████████████████████| 952kB 619kB/s 
Installing collected packages: selenium
Successfully installed selenium-3.9.0
Collecting fpdf
  Using cached fpdf-1.7.2.tar.gz
Building wheels for collected packages: fpdf
  Running setup.py bdist_wheel for fpdf ... done
  Stored in directory: /home/me/.cache/pip/wheels/c9/22/63/16731bdbcccd4a91f5f9e9bea98b1e51855a678f2c6510ae76
Successfully built fpdf
Installing collected packages: fpdf
Successfully installed fpdf-1.7.2
Collecting Pillow
  Using cached Pillow-5.0.0-cp36-cp36m-manylinux1_x86_64.whl
Installing collected packages: Pillow
Successfully installed Pillow-5.0.0
Cloning into 'scribd-downloader-3'...
remote: Counting objects: 42, done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 42 (delta 12), reused 31 (delta 8), pack-reused 7
Unpacking objects: 100% (42/42), done.
--2018-02-11 23:57:01--  https://ftp.mozilla.org/pub/firefox/releases/58.0/linux-x86_64/en-GB/firefox-58.0.tar.bz2
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving ftp.mozilla.org (ftp.mozilla.org)... 54.192.151.87
Connecting to ftp.mozilla.org (ftp.mozilla.org)|54.192.151.87|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 53962282 (51M) [application/x-tar]
Saving to: ‘firefox-58.0.tar.bz2’

firefox-58.0.tar.bz 100%[===================>]  51.46M   645KB/s    in 71s     

2018-02-11 23:58:13 (742 KB/s) - ‘firefox-58.0.tar.bz2’ saved [53962282/53962282]

firefox/
firefox/libnss3.so
firefox/libmozsandbox.so
firefox/firefox.sig
firefox/platform.ini
firefox/icudt59l.dat
firefox/libnspr4.so
firefox/libplc4.so
firefox/defaults/
firefox/defaults/pref/
firefox/defaults/pref/channel-prefs.js
firefox/gtk2/
firefox/gtk2/libmozgtk.so
firefox/updater
firefox/libsmime3.so
firefox/plugin-container.sig
firefox/libmozgtk.so
firefox/plugin-container
firefox/libmozavutil.so
firefox/libnssdbm3.chk
firefox/icons/
firefox/icons/updater.png
firefox/libnssckbi.so
firefox/libnssdbm3.so
firefox/gmp-clearkey/
firefox/gmp-clearkey/0.1/
firefox/gmp-clearkey/0.1/libclearkey.so.sig
firefox/gmp-clearkey/0.1/manifest.json
firefox/gmp-clearkey/0.1/libclearkey.so
firefox/Throbber-small.gif
firefox/browser/
firefox/browser/features/
firefox/browser/features/formautofill@mozilla.org.xpi
firefox/browser/features/firefox@getpocket.com.xpi
firefox/browser/features/followonsearch@mozilla.com.xpi
firefox/browser/features/onboarding@mozilla.org.xpi
firefox/browser/features/webcompat@mozilla.org.xpi
firefox/browser/features/activity-stream@mozilla.org.xpi
firefox/browser/features/screenshots@mozilla.org.xpi
firefox/browser/features/shield-recipe-client@mozilla.org.xpi
firefox/browser/features/aushelper@mozilla.org.xpi
firefox/browser/icons/
firefox/browser/icons/mozicon128.png
firefox/browser/crashreporter-override.ini
firefox/browser/blocklist.xml
firefox/browser/omni.ja
firefox/browser/extensions/
firefox/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}.xpi
firefox/browser/chrome.manifest
firefox/browser/chrome/
firefox/browser/chrome/icons/
firefox/browser/chrome/icons/default/
firefox/browser/chrome/icons/default/default48.png
firefox/browser/chrome/icons/default/default32.png
firefox/browser/chrome/icons/default/default16.png
firefox/dependentlibs.list
firefox/libmozavcodec.so
firefox/liblgpllibs.so
firefox/update-settings.ini
firefox/libmozsqlite3.so
firefox/updater.ini
firefox/firefox-bin.sig
firefox/application.ini
firefox/omni.ja
firefox/libsoftokn3.so
firefox/firefox-bin
firefox/libsoftokn3.chk
firefox/libfreeblpriv3.so
firefox/crashreporter.ini
firefox/libfreeblpriv3.chk
firefox/firefox
firefox/libssl3.so
firefox/removed-files
firefox/libxul.so.sig
firefox/libnssutil3.so
firefox/precomplete
firefox/fonts/
firefox/fonts/EmojiOneMozilla.ttf
firefox/chrome.manifest
firefox/minidump-analyzer
firefox/libxul.so
firefox/libplds4.so
firefox/crashreporter
firefox/pingsender
--2018-02-11 23:58:20--  https://github.com/mozilla/geckodriver/releases/download/v0.19.1/geckodriver-v0.19.1-linux64.tar.gz
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving github.com (github.com)... 192.30.255.113, 192.30.255.112
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/25354393/e31e4c22-be6f-11e7-9bc7-dedc3490a7fd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20180211%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20180211T155824Z&X-Amz-Expires=300&X-Amz-Signature=63a2679b40ef14ac1f44ba80b480b96a22ca0f36803e8785ac7c3b80418dddee&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dgeckodriver-v0.19.1-linux64.tar.gz&response-content-type=application%2Foctet-stream [following]
--2018-02-11 23:58:24--  https://github-production-release-asset-2e65be.s3.amazonaws.com/25354393/e31e4c22-be6f-11e7-9bc7-dedc3490a7fd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20180211%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20180211T155824Z&X-Amz-Expires=300&X-Amz-Signature=63a2679b40ef14ac1f44ba80b480b96a22ca0f36803e8785ac7c3b80418dddee&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dgeckodriver-v0.19.1-linux64.tar.gz&response-content-type=application%2Foctet-stream
Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.65.128
Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.65.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2301226 (2.2M) [application/octet-stream]
Saving to: ‘geckodriver-v0.19.1-linux64.tar.gz’

geckodriver-v0.19.1 100%[===================>]   2.19M   620KB/s    in 3.6s    

2018-02-11 23:58:29 (620 KB/s) - ‘geckodriver-v0.19.1-linux64.tar.gz’ saved [2301226/2301226]

geckodriver
/home/archives/Internet/Scraping/Scribd/scribd_downloader_packaged/download_everything:/home/archives/Internet/Scraping/Scribd/scribd_downloader_packaged/download_everything/firefox:/home/archives/Internet/Scraping/Scribd/scribd_downloader_packaged/download_everything/scribd-downloader-3:/home/archives/Internet/Scraping/Scribd/scribd_downloader_packaged/download_everything/venv/bin:/home/me/bin/scripts:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
======================================
Great, you downloaded everything.
Now (and later), just run the following to load the path:
$ bash set_up_local_scribd_download.sh
And then use the scribd downloader like this:
$ scribd_downloader_3.py <your url> <your output.pdf>
E.g:
$ scribd_downloader_3.py https://www.scribd.com/doc/63942746/chopin-nocturne-n-20-partition chopin.pdf
If if fails, then please report here:
  https://github.com/tobiasBora/scribd-downloader-3/issues/
Enjoy !

[me@linux download_everything]$ scribd_downloader_3.py https://www.scribd.com/doc/63942746/chopin-nocturne-n-20-partition chopin.pdf
Scraping url: https://www.scribd.com/doc/63942746/chopin-nocturne-n-20-partition
Output: chopin.pdf
I will start the scraping...
Will load the webdriver for firefox...
/!\ ERROR /!\
Message: Unable to find a matching set of capabilities

The script cannot find the executable 'geckodriver'.
Please, download it from the page:
  https://github.com/mozilla/geckodriver/releases
Extract it, and run again this script, by adding the
path of extraction via the tag '-p'.
E.g: ./scribd_downloader_3.py -p geckodriver-v0.19.0-linux64 "https://www.scribd.com/document/31698781/Constitution-of-the-Mexican-Mafia-in-Texas" out.pdf
[me@linux download_everything]$ 
tobiasBora commented 6 years ago

Hum, it is strange... Could you please the AppImage I made, that basically packs everything included Firefox, python, and Geckodriver: https://github.com/tobiasBora/scribd-downloader-3/releases/tag/v18.01 .

nobicycle commented 6 years ago

Hi,

I obtained "No such file or directory"

$ firejail ./Scribd_Downloader_3-x86_64.AppImage

Reading profile /etc/firejail/default.profile Reading profile /etc/firejail/disable-common.inc Reading profile /etc/firejail/disable-passwdmgr.inc Reading profile /etc/firejail/disable-programs.inc

Note: you can use --noprofile to disable default.profile

Parent pid 21025, child pid 21026 Warning: /sbin directory link was not blacklisted Warning: /usr/sbin directory link was not blacklisted Child process initialized in 51.99 ms /bin/bash: ./Scribd_Downloader_3-x86_64.AppImage: No such file or directory

Parent is shutting down, bye... [me@linux Scribd]$ firejail ./Scribd_Downloader_3-x86_64.AppImage --help Reading profile /etc/firejail/default.profile Reading profile /etc/firejail/disable-common.inc Reading profile /etc/firejail/disable-passwdmgr.inc Reading profile /etc/firejail/disable-programs.inc

Note: you can use --noprofile to disable default.profile

Parent pid 21035, child pid 21036 Warning: /sbin directory link was not blacklisted Warning: /usr/sbin directory link was not blacklisted Child process initialized in 38.00 ms /bin/bash: ./Scribd_Downloader_3-x86_64.AppImage: No such file or directory

Parent is shutting down, bye...

tobiasBora commented 6 years ago

Uhm… Are you on a 64bits system or 32bits ? Could please send me the result of $ uname -a $ ldd ./Scribd_Downloader_3-x86_64.AppImage $ file ./Scribd_Downloader_3-x86_64.AppImage

Thank you, -- Tobias Bora

Le 24 février 2018 05:21:14 GMT+01:00, nobicycle notifications@github.com a écrit :

Hi,

I obtained "No such file or directory"

$ firejail ./Scribd_Downloader_3-x86_64.AppImage

Reading profile /etc/firejail/default.profile Reading profile /etc/firejail/disable-common.inc Reading profile /etc/firejail/disable-passwdmgr.inc Reading profile /etc/firejail/disable-programs.inc

Note: you can use --noprofile to disable default.profile

Parent pid 21025, child pid 21026 Warning: /sbin directory link was not blacklisted Warning: /usr/sbin directory link was not blacklisted Child process initialized in 51.99 ms /bin/bash: ./Scribd_Downloader_3-x86_64.AppImage: No such file or directory

Parent is shutting down, bye... [me@linux Scribd]$ firejail ./Scribd_Downloader_3-x86_64.AppImage --help Reading profile /etc/firejail/default.profile Reading profile /etc/firejail/disable-common.inc Reading profile /etc/firejail/disable-passwdmgr.inc Reading profile /etc/firejail/disable-programs.inc

Note: you can use --noprofile to disable default.profile

Parent pid 21035, child pid 21036 Warning: /sbin directory link was not blacklisted Warning: /usr/sbin directory link was not blacklisted Child process initialized in 38.00 ms /bin/bash: ./Scribd_Downloader_3-x86_64.AppImage: No such file or directory

Parent is shutting down, bye...

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/tobiasBora/scribd-downloader-3/issues/2#issuecomment-368198643

nobicycle commented 6 years ago

[me@linux Scribd]$ uname -a Linux linux.local 4.14.19-1-MANJARO #1 SMP PREEMPT Tue Feb 13 16:36:24 UTC 2018 x86_64 GNU/Linux [me@linux Scribd]$ ldd ./Scribd_Downloader_3-x86_64.AppImage not a dynamic executable [me@linux Scribd]$ file ./Scribd_Downloader_3-x86_64.AppImage ./Scribd_Downloader_3-x86_64.AppImage: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, stripped

tobiasBora commented 6 years ago

I guess that the problem is the interpreter, again... I'll try to find a way to statically linked everything, including the interpreter... Meanwhile, if you have in your system a library ld-linux-x86-64.so.2:

locate ld-linux-x86-64.so.2

and make sure that there is no file /lib64/ld-linux-x86-64.so.2 (I guess it's the case, else you would not have any error).

Then, you can try to link the found ld-linux library into the /lib64 directory, even if it's a bit dirty:

sudo mkdir /lib64
sudo ln -s <your ld-linux....so library> /lib64/ld-linux-x86-64.so.2

I think it should be enough to work even if it's a bit dirty... Meanwhile I'll try to find a way to also pack the interpreter, maybe by using some nix-linked stuff...