meltmedia / nutch

Mirror of Apache Nutch
Apache License 2.0
0 stars 0 forks source link

Support locally mounted afp based directories. #1

Open ctrimble opened 11 years ago

ctrimble commented 11 years ago

To index files on our internal share, it would be nice to mount the directory in readonly mode and then plug the mounted directory into Nutch using a custom Protocol that takes the afp:// URLs and translates them into file:// URLs. This should allow Nutch to produce the proper URLs to elastic search, without having to implement afp:// in Java.

ctrimble commented 11 years ago

The steps to use this feature should look something like:

  1. Mount the afp:// filesystem to the local file system in a readonly mode.
  2. Configure the afp:// protocol to map a domain and path in its URL to the mount directory on the file system.
  3. Place the afp:// URL in the urls/seed.txt file.