Closed GoogleCodeExporter closed 8 years ago
I will document this.
In practice PPSS will often process a file or a directory. PPSS can however
process any string text as an item when using the -f (text file input) option
such as URLs.
Deriving the job_log file name from such input would one way or the other cause
problems with file names due to forbidden characters, too long file names etc.
It is just bound to go wrong. Therefore, the file name of the log file is
derived from an MD5 hash of the item that is processed. On the name that is,
not it's contents if the item is a file.
I think that it is not a problem because I don't think that the file name
matters. What matters if items failed to process. Such files you can find with
grep. And then you automatically are able to notice which items failed, abet
not through their filename, but through their content. Not totally ideal but
this is how it works.
Original comment by Louwrentius
on 9 Aug 2011 at 7:33
Yes I realise that it's possible to grep the output files, but it imposes a
requirement on the "command" to put the filename being processed into the
output stream. I think that a script like ppss should find it trivial to have
an option to strip the path and extension from the $ITEM, strip any spaces or
punctuation, and include the resulting shorter string in the job_log output
file name. Or make this an option. I also suspect that almost all usages of
PPSS will not have $ITEM names that never include spaces or special characters,
and if PPSS was to find a $ITEM value that included spaces then to revert to
using the md5 name for that $ITEM only.
Having a visibly easier way to reconcile the $ITEM filename with its job_log
output file name would add value to a tool like PPSS - so I guess it's
something I can always do myself!
Original comment by oreilly....@gmail.com
on 13 Aug 2011 at 7:13
I will take it into account. PPSS has some filtering function already build in
that is used in other places. I will think about it. I understand that it is
better for usability.
Original comment by Louwrentius
on 17 Aug 2011 at 10:26
I now use sed s/[^[:album:]]/_/g to filter ITEMS. I'm aware that this does not
rule out collisions but it won't cause any most of the time. Especially if you
are processing files, this is not an issue.
The new version is in subversion.
Original comment by Louwrentius
on 23 Aug 2011 at 10:11
Oh by the way, there is a new -m / --md5 option that allows you to use MD5 if
you really want to be sure that there can't be any collisions.
Original comment by Louwrentius
on 23 Aug 2011 at 10:12
Original comment by Louwrentius
on 23 Aug 2011 at 10:12
Original comment by Louwrentius
on 25 Dec 2011 at 4:45
Original issue reported on code.google.com by
oreilly....@gmail.com
on 5 Aug 2011 at 9:57