Currently uuid of file is generated from sha of filename. Would be nice if sha included date of modification as well (since some files will have the same name). Currently title is filename, but directory path often includes useful metadata - so directory path (minus context data_dir) would be better to use as title of metadata record.
Ultimately it might be better to have the crawler running as a thread which monitors the context data_dir using java7 watchdir and:
cache file state in a jcs instance (stashes the jcs on a filesystem when it is stopped, restores it when it starts) - could also use another persistence like a h2 db I suppose but maybe a bit too heavyweight for this in terms of config
updates or generates new metadata records according to the changes returned by watchdir
uncertainty over whether watchdir can handle file move operations correctly?
Currently uuid of file is generated from sha of filename. Would be nice if sha included date of modification as well (since some files will have the same name). Currently title is filename, but directory path often includes useful metadata - so directory path (minus context data_dir) would be better to use as title of metadata record.
Ultimately it might be better to have the crawler running as a thread which monitors the context data_dir using java7 watchdir and: