Closed asajeffrey closed 6 years ago
I'd like to try this one :)
Go for it!
is this still open?
@pduzinki are you still working on this, or can @pawan92 give it a shot?
Sorry, didn't really have time recently. @pawan92 can give it a shot, sure :)
i'll give it a shot! hope to have something in by end of the week
apologize for lateness. tried following the webservice steps and I am having some issues installing using virtualenv. i am running on mac
Well, it should work without virtualenv, I usually recommend virtualenv just to avoid installing stuff globally.
hmm ok let me try that out
is wayback something seperate we need to install?
wayback should be installed when you run pip install git+https://github.com/ikreymer/pywb.git
i did that and it still says command not found
It's probably installed it somewhere that isn't on your PATH. Try giving the full path name for the command.
Can I work on this?
Go for it!
Using the instructions, I tried to play the existing archive shown in the example, but it isn't working. The terminal output is:
$ proxychains ~/servo/mach run -r --certificate-path proxy-certs/pywb-ca.pem https://www.wbez.org/
ProxyChains-3.1 (http://proxychains.sf.net)
|DNS-request| www.wbez.org
|S-chain|-<>-127.0.0.1:9050-<--timeout
|DNS-response|: www.wbez.org does not exist
and this is the servo window that opened:
The same thing happens when I try to record https://www.example.com or https://github.com
That's odd. What output do you get in the window running wayback? It should be something like:
$ wayback --proxy WBEZ --port 8321
[Errno 2] No such file or directory: './config.yaml'
2018-09-05 10:11:34,780: [INFO]: Proxy enabled for collection "WBEZ"
2018-09-05 10:11:34,845: [INFO]: Starting Gevent Server on 8321
127.0.0.1 - - [2018-09-05 10:11:51] "POST /WBEZ/resource/postreq?matchType=exact&url=https%3A%2F%2Fwww.wbez.org%2F&closest=now HTTP/1.1" 200 16439 0.003355
::ffff:127.0.0.1 - - [2018-09-05 10:11:52] "CONNECT 34.205.97.32:443 HTTP/1.0" 200 94 0.328500
127.0.0.1 - - [2018-09-05 10:11:52] "POST /WBEZ/resource/postreq?matchType=exact&url=https%3A%2F%2Fwww.googletagmanager.com%2Fgtm.js%3Fid%3DGTM-57WD36%26gtm_auth%3DtmHQZJnpm8lhBlUlUxcf9A%26gtm_preview%3Denv-1%26gtm_cookies_win%3Dx&closest=now HTTP/1.1" 200 84206 0.003478
Dir collections/WBEZ/indexes/ unchanged
127.0.0.1 - - [2018-09-05 10:11:52] "POST /WBEZ/resource/postreq?matchType=exact&url=https%3A%2F%2Fwww.wbez.org%2Fcss%2Fcpm-1515786996172.css&closest=now HTTP/1.1" 200 53965 0.000919
Dir collections/WBEZ/indexes/ unchanged
...
@asajeffrey
(venv) cubetastic@(my hostname):~$ wayback --proxy Example --live --proxy-record --autoindex --port 8321
[Errno 2] No such file or directory: './config.yaml'
2018-09-05 21:57:48,566: [INFO]: Proxy recording into collection "Example"
2018-09-05 21:57:48,571: [INFO]: Auto-Indexing Enabled on "/home/cubetastic/collections", checking every 30 secs
2018-09-05 21:57:48,655: [INFO]: Starting Gevent Server on 8321
2018-09-05 21:57:48,656: [INFO]: Checking Collection: Example
2018-09-05 21:57:48,656: [INFO]: Checking Collection: GitHub
2018-09-05 21:58:18,673: [INFO]: Checking Collection: Example
2018-09-05 21:58:18,674: [INFO]: Checking Collection: GitHub
and
cubetastic@(my hostname):~$ proxychains ~/servo/mach run -r --certificate-path proxy-certs/pywb-ca.pem https://example.com/
ProxyChains-3.1 (http://proxychains.sf.net)
|DNS-request| example.com
|S-chain|-<>-127.0.0.1:9050-<--timeout
|DNS-response|: example.com does not exist
and
@asajeffrey Do you have any idea on what is causing the error?
@cubetastic33 sorry about the delay getting back to you.
After a bit of digging, I can replicate this error by removing the proxychains.conf
file, so I suspect what is going on is that for some reason proxychains isn't picking up the conf file. Are you running this command from inside the servo-warc-tests directory?
@asajeffrey No. As you can see in the output I have shown above, both of them are executed from the home directory, but the first one is inside a virtual env.
Ah, you need to run the commands from the servo-warc-tests directory, so that proxychains will find its config file.
@asajeffrey I'm sorry, but where exactly would the directory you're talking about be?
The clone of the servo-warc-tests repo.
@asajeffrey Great! I recorded instagram.com. I did get an error though: ERROR 2018-09-07T15:47:58Z: script::dom::bindings::error: Error at https://staticxx.facebook.com/connect/xd_arbiter/r/0P3pVtbsZok.js?version=42#channel=f187fab468e0166&origin=https%3A%2F%2Fwww.instagram.com:60:2139 /https?/.exec(...) is null
but that just seems to be a JS error because of instagram. Now what should I do?
You can ignore that error, Once you've recorded the archive, add it to the ARCHIVES file, then create a PR.
@asajeffrey I don't really get it. The ARCHIVES file just seems to be a list of all the completed sites. Don't I have to show my recorded archive as well? Also, 360.cn is there in the ARCHIVES file, but the issue is still open and it is still unchecked in the main issue (#37).
Yes, add an entry Instagram: https://instagram.com/
to the ARCHIVES file, so the archive will be tested each night. (Assuming you recorded https://instagram.com/ in an archive called Instagram.)
@asajeffrey Who will be testing the archive each night?
It's run as part of our nightly testing, the results are at https://servo.org/dashboards/
@asajeffrey Then what exactly is my role in it? I mean - if all I had to do was to add that one line in ARCHIVES, anybody could do it! So - what was the reason I had to record it myself? Sorry for asking so many questions - I just didn't understand...
The nightly job records the performance against the recorded archive. Recording the archive itself is still a manual process.
@asajeffrey so, doesn't that mean my Archive should be included somewhere in this repository? However, I'm just adding a line to the ARCHIVES file!
@asajeffrey How do you close this issue?
37