Open mkragelj opened 5 years ago
hi,
I try to harvest this site: https://podcrto.si As National Library we harvest several domains to preserve the information. I tried with Heritrix 1.14.4 and 3.4 but without success.
I'm getting this:
[code] [status] [seed] [redirect] -50 NOTCRAWLED https://podcrto.si/ 200 CRAWLED https://e-uprava.gov.si/
and
LONGEST#2: Queue si,podcrto, (p3) 2 items wakes in: 13m45s77ms last enqueued: https://podcrto.si/robots.txt last peeked: https://podcrto.si/robots.txt total expended: 15 (total budget: -1) active balance: 2985 last(avg) cost: 1(1) totalScheduled fetchSuccesses fetchFailures fetchDisregards fetchResponses robotsDenials successBytes totalBytes fetchNonResponses lastSuccessTime 3 1 0 0 1 0 54 54 16 2019-05-23T07:14:29.825Z SimplePrecedenceProvider 3
Can anyone help or explain what could be the reason for this? Thank you in advance.
Best, Matjaž
..works with WCT 2.01
Matjaž
hi,
I try to harvest this site: https://podcrto.si As National Library we harvest several domains to preserve the information. I tried with Heritrix 1.14.4 and 3.4 but without success.
I'm getting this:
[code] [status] [seed] [redirect] -50 NOTCRAWLED https://podcrto.si/ 200 CRAWLED https://e-uprava.gov.si/
and
LONGEST#2: Queue si,podcrto, (p3) 2 items wakes in: 13m45s77ms last enqueued: https://podcrto.si/robots.txt last peeked: https://podcrto.si/robots.txt total expended: 15 (total budget: -1) active balance: 2985 last(avg) cost: 1(1) totalScheduled fetchSuccesses fetchFailures fetchDisregards fetchResponses robotsDenials successBytes totalBytes fetchNonResponses lastSuccessTime 3 1 0 0 1 0 54 54 16 2019-05-23T07:14:29.825Z SimplePrecedenceProvider 3
Can anyone help or explain what could be the reason for this? Thank you in advance.
Best, Matjaž