machawk1 / wail

:whale2: Web Archiving Integration Layer: One-Click User Instigated Preservation
https://matkelly.com/wail
MIT License
350 stars 35 forks source link

Use Heritrix API to get list of jobs rather than referencing file system hierarchy #140

Open machawk1 opened 9 years ago

machawk1 commented 9 years ago

Remnants in: class Heritrix > getListOfJobs

Uses glob API.

machawk1 commented 9 years ago

Use Rescan Jobs Directory API function from https://webarchive.jira.com/wiki/display/Heritrix/Heritrix+3.x+API+Guide#Heritrix3.xAPIGuide-REST.

machawk1 commented 9 years ago

curl -v -d "action=rescan" -k -u admin:admin --anyauth --location -H "Accept: application/xml" https://localhost:8443/engine will return content like

<?xml version="1.0" standalone='yes'?>

<engine>
  <heritrixVersion>3.2.0</heritrixVersion>
  <heapReport>
    <usedBytes>69126696</usedBytes>
    <totalBytes>134217728</totalBytes>
    <maxBytes>238551040</maxBytes>
  </heapReport>
  <jobsDir>/Applications/WAIL.app/bundledApps/heritrix-3.2.0/jobs</jobsDir>
  <jobsDirUrl>https://localhost:8443/engine/jobsdir/</jobsDirUrl>
  <availableActions>
    <value>rescan</value>
    <value>add</value>
    <value>create</value>
  </availableActions>
  <jobs>
    <value>
      <shortName>1444351035</shortName>
      <url>https://localhost:8443/engine/job/1444351035</url>
      <isProfile>false</isProfile>
      <launchCount>1</launchCount>
      <lastLaunch>2015-10-09T00:37:33.862Z</lastLaunch>
      <hasApplicationContext>false</hasApplicationContext>
      <statusDescription>Unbuilt</statusDescription>
      <isLaunchInfoPartial>false</isLaunchInfoPartial>
      <primaryConfig>/Applications/WAIL.app/bundledApps/heritrix-3.2.0/jobs/1444351035/crawler-beans.cxml</primaryConfig>
      <primaryConfigUrl>https://localhost:8443/engine/jobdir/crawler-beans.cxml</primaryConfigUrl>
      <key>1444351035</key>
    </value>
  </jobs>
</engine>