Open DHKaplan opened 11 months ago
Hey @DHKaplan, you need to enter the exact URL prefix you want when running warcit. For instance
warcit http://www.wticalumni.com/ my-local-folder
The prefix could be anything, for instance something like:
warcit 'http://mydomain.com/query?q=' my-local-folder
This flexibility of the tool makes it necessary that you give the exact URL prefix.
@despens The folder that contains my html is www.wticalumni.com
and the command I am using is warcit https://www.wticalumni.com ./www.wticalumni.com/
I get no pages found. When I edit the gz file with an ASCII editor I get:
WARC/1.0
WARC-Date: 2023-05-10T18:36:00Z
WARC-Source-URI: file://./www.wticalumni.com/events.htm
WARC-Creation-Date: 2023-08-26T17:00:45Z
WARC-Type: resource
WARC-Record-ID: <urn:uuid:e1377996-9417-4ddb-8af8-19dc44972209>
WARC-Target-URI: https://www.wticalumni.comevents.htm
WARC-Payload-Digest: sha1:AP4CVEJE4OHSPK24OURQRPDOHKP2LWOA
WARC-Block-Digest: sha1:AP4CVEJE4OHSPK24OURQRPDOHKP2LWOA
Content-Type: text/html
Content-Length: 10002
Note the the Source-URI line is WARC-Source-URI: file://./www.wticalumni.com/events.htm
While the Target-URI line is WARC-Target-URI: https://www.wticalumni.comevents.htm
There is no slash before the file name in the Target-URI.
I really appreciate your reply, but I can't see what I am doing wrong.
Hi @DHKaplan, you just need to use the desired /
character in the command:
warcit https://www.wticalumni.com/ ./www.wticalumni.com/
^
|
important
I needed a small warc file for testing, so I took a regular wget download and picked a few files that interconnected and used warcit to create the warc file. When I looked at it in Replayweb.page there were no pages visible. I edited the warc file in an ASCII editor and found that the "/" was not being inserted after the domain name. Please see https://forum.webrecorder.net/t/warcit-not-putting-a-before-the-file-name/413 for more information.