oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
MIT License
617 stars 39 forks source link

Enhancement: Add everything from a WARC under one pin instead of many #830

Open machawk1 opened 6 months ago

machawk1 commented 6 months ago

From @ProximaNova in #810, they suggested (summarized):

Could you provide an option for "ipwb index" that adds everything added from .warc under one pin instead of many pins. Use: for those who want to have as few CIDs in their pinset as possible. It wouldn't matter to those who don't care about having hundreds/thousands of pins. For now, I did this:

  1. edit .cdxj with vim to just get CIDs, on per line, save to a different file (don't overwrite the CDXJ)
  2. run cat f1.cdxj.cid | xargs -d "\n" sh -c 'for args do ipfs files cp /ipfs/$args /a/f1cid/$args && ipfs pin rm $args; done' _
  3. run ipfs files ls --long /a then pin the CID you see for that