vctfence / scrapbee

Mozilla Public License 2.0
39 stars 23 forks source link

Download scrapbee incomplete #27

Closed raiwer closed 3 years ago

raiwer commented 5 years ago

[info] ScrapBee version = 1.8.2 [info] browser = Firefox 60.7.2 [info] platform = Linux x86_64 Opensuse 42.3 Download does not complete. I get only scrapbee_backend with 2237936 Byte even 10 minutes waiting. I've created Install.sh and scrapbee_backend.json manually. After starting I get the message and there is no download of the page I wanted. [info] start backend service on port 9900. [error] backend disconnected due to an error: An unexpected error occurred

vctfence commented 5 years ago

Hi, please re-download the backend, if it still fails, please download the backend here https://github.com/vctfence/scrapbee_backend, choose the one for you platform, for linux and mac please rename the downloaded backend to be "scrapbee_backend". Place the backend into ~/Download/scrapbee, and run install.sh again.

raiwer commented 5 years ago

Download worked. The downloaded scrapbee_backend has no difference with the downloaded before. the command diff shows no difference. my self-created install.sh

!/bin/bash

chmod +x scrapbee_backend dest="${HOME}/.mozilla/native-messaging-hosts" if [ ! -d "$dest" ];then mkdir -p "$dest" fi cp scrapbee_backend.json "$dest" echo done

does what it has to do and copies my self-created file scrapbee_backend.json to the directory mentioned above.

{ "allowed_extensions": [ "scrapbee@scrapbee.org" ], "description": "Scrapbee backend", "name": "scrapbee_backend", "path": "FIREFOX-DOWNLOAD-DIRECTORY/scrapbee/scrapbee_backend", "type": "stdio" }

after restart firefox

[info] ScrapBee version = 1.8.3 [info] browser = Firefox 60.7.2 [info] platform = Linux x86_64 [info] start backend service on port 9900. [error] backend disconnected due to an error: An unexpected error occurred

after drying to download a file

[error] rdf have not been loaded

after creation of a rdf-file

<?xml version="1.0"?>

again the result [error] rdf have not been loaded changing of the port to 9375 [info] ScrapBee version = 1.8.3 [info] browser = Firefox 60.7.2 [info] platform = Linux x86_64 [info] start backend service on port 9375. [error] backend disconnected due to an error: An unexpected error occurred
vctfence commented 5 years ago

Hi, scrapbee can download all the 3 files (scrapbee_backen, scrapbee_backend.json, install.sh) by itself, if you create json file manually, please replace FIREFOX-DOWNLOAD-DIRECTORY to be real path.

raiwer commented 5 years ago

as mentioned above the installation of scrapbee did not end and there was no download of .json and install.sh.

the replacement of FIREFOX-DOWNLOAD-DIRECTORY with the real directory makes scrapbee now running. Thank you very much. but some pages will not be downloaded complete.

scrap of the following pages does not end even after 10 minutes: https://www.nissomanie.de/kykladen/milos-1/vani/

all sources are saved except : https://image.j...2074/image.jpg 360c37d771cbc80287a8eaed95bec376.jpeg buffered index.css index.css buffered HTML index.html index.html buffered

log : [error] error process css null: The URI is malformed. [error] error process css null: The URI is malformed. [error] error process css null: The URI is malformed.

no ending for scrap : http://www.kykladenfieber.de/Inselberichte/Ios-Manganari-2015

all sources are saved except : http://www.kykl...background.jpg 79cdbd0aeb02055a2ebdad59455d1c7b buffered index.css index.css buffered HTML index.html index.html buffered

[error] error process css null: The URI is malformed. [error] error process css null: The URI is malformed. [error] error process css null: The URI is malformed.

scrap has saved the complete page : http://www.kreta-welt.de/threads/5276-Naxos-und-Donoussa?highlight=gwg%27s+Reise-Impressionen

from the same internet side scrap does not end with a lot of buffered gif's : http://www.kreta-welt.de/threads/4399-gwg-s-Reise-Impressionen?highlight=gwg%27s+Reise-Impressionen

http://www.kret...ksite_yigg.gif 15667efa309fd0fc659d6f0ef0e8481f.gif buffered http://www.kret...ite_google.gif 2223460a6e5ab5f16efea768413a14ef.gif buffered http://www.kret..._delicious.gif d627e5cb6b74810ddf576ef29e3453e6.gif buffered http://www.kret...llapse_40b.png d1121e19218bad64d1689541f590b7ea.png buffered http://www.kret...and=1564859311 c063d2746551921edf503fb65a989426.gif buffered http://www.kret...de/favicon.ico favicon.ico buffered CSS index.css index.css buffered HTML index.html index.html buffered

log : [error] error process css null: The URI is malformed. [error] error process css null: The URI is malformed. [error] error process css null: The URI is malformed.

vctfence commented 5 years ago

Hi, I can capture the first page successfully, and I can not access the second page which shows "The requested URL was rejected" so could not have a test, the 3rd is successful for me. It's just some kind of network problem of yours maybe.

Kerenok commented 5 years ago

This issue now seems to discuss the same problem as another other issue https://github.com/vctfence/scrapbee/issues/17 : the capture of a page not ending with numerous images.

I still found the problem with the page https://www.babelio.com/livres-/nature-writing/932 using different types of networks (fiber, Wifi, 4G).

raiwer commented 5 years ago

Hi,

i had a similar problem with Scrapbook. If I tried do downlaod to much pages from a website it has not worked right. My solution was to pause it for some seconds and then restart it. May be it make sense, to pause the download after 50 or 100 pictures for some seconds. Thank you very much for your work Günter

Am Mi., 28. Aug. 2019 um 10:53 Uhr schrieb Kerenok <notifications@github.com

:

This issue now seems to discuss the same problem as another other issue

17 https://github.com/vctfence/scrapbee/issues/17 : the capture of a

page not ending with numerous images.

I still found the problem with the page https://www.babelio.com/livres-/nature-writing/932 using different types of networks (fiber, Wifi, 4G).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vctfence/scrapbee/issues/27?email_source=notifications&email_token=AMWSSQG5BF3MDIJRFJ6DDB3QGY4H5A5CNFSM4IGXP4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5KMIRQ#issuecomment-525648966, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWSSQH6MDNFP4AS7CMGQMTQGY4H5ANCNFSM4IGXP4GA .

vctfence commented 5 years ago

Hi, @raiwer I just want to confirm. So you found same problem with Scrapbook?

Kerenok commented 5 years ago

I also tried Scrapbook on the page causing Scrapbee to fail : the download is complete with over 100 images correctly saved.

raiwer commented 5 years ago

On this above mentioned website http://www.kreta-welt.de/threads/4399-gwg-s-Reise-Impressionen i can choose in the user control center the number of threads to show on 1 page. If I choose 20 then the scrapbee download ends normally. With 30 oder 40 scrapbee doesnt end successfully. So I think it makes a sense to pause the downloads for some seconds after 200 downloads because the successfull download has 212 elements.

raiwer commented 5 years ago

Sorry. It must be posts instead of threads. Excuse my bad English. Günter

vctfence commented 4 years ago

Please update to 1.9.7 to see if it helps for you guys, thanks.