madhuneal / ppss

Automatically exported from code.google.com/p/ppss
0 stars 0 forks source link

job_log file names #52

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

(this is more a question than a problem)

1. run a ppss that invokes a script per $ITEM, and that script produces item 
specific output.
2. Look in the job_log directory
3. See the filenames in the job_log directory

What is the expected output? What do you see instead?
I expected that the file-names in the job_log directory would somehow relate to 
the name of the file for the item being processed. Instead the file names are 
hex strings like '8717e67da99e3d6ee9105ac7c08d4063'. The contents of these 
files let me see the filename being processed, but the file-name pattern is not 
helpful for quickly seeing which filename relates to which processed-item.

What version of the product are you using? On what operating system?
2.85 on RHEL 5 (x64) virtualised

Please provide any additional information below.
New use. Liked the idea of PPSS.

Original issue reported on code.google.com by oreilly....@gmail.com on 5 Aug 2011 at 9:57

GoogleCodeExporter commented 9 years ago
I will document this. 

In practice PPSS will often process a file or a directory. PPSS can however 
process any string text as an item when using the -f (text file input) option 
such as URLs. 

Deriving the job_log file name from such input would one way or the other cause 
problems with file names due to forbidden characters, too long file names etc. 
It is just bound to go wrong. Therefore, the file name of the log file is 
derived from an MD5 hash of the item that is processed. On the name that is, 
not it's contents if the item is a file. 

I think that it is not a problem because I don't think that the file name 
matters. What matters if items failed to process. Such files you can find with 
grep. And then you automatically are able to notice which items failed, abet 
not through their filename, but through their content. Not totally ideal but 
this is how it works.

Original comment by Louwrentius on 9 Aug 2011 at 7:33

GoogleCodeExporter commented 9 years ago
Yes I realise that it's possible to grep the output files, but it imposes a 
requirement on the "command" to put the filename being processed into the 
output stream.  I think that a script like ppss should find it trivial to have 
an option to strip the path and extension from the $ITEM, strip any spaces or 
punctuation, and include the resulting shorter string in the job_log output 
file name. Or make this an option. I also suspect that almost all usages of 
PPSS will not have $ITEM names that never include spaces or special characters, 
and if PPSS was to find a $ITEM value that included spaces then to revert to 
using the md5 name for that $ITEM only.

Having a visibly easier way to reconcile the $ITEM filename with its job_log 
output file name would add value to a tool like PPSS - so I guess it's 
something I can always do myself!

Original comment by oreilly....@gmail.com on 13 Aug 2011 at 7:13

GoogleCodeExporter commented 9 years ago
I will take it into account. PPSS has some filtering function already build in 
that is used in other places. I will think about it. I understand that it is 
better for usability.

Original comment by Louwrentius on 17 Aug 2011 at 10:26

GoogleCodeExporter commented 9 years ago
I now use sed s/[^[:album:]]/_/g to filter ITEMS. I'm aware that this does not 
rule out collisions but it won't cause  any most of the time. Especially if you 
are processing files, this is not an issue.

The new version is in subversion.

Original comment by Louwrentius on 23 Aug 2011 at 10:11

GoogleCodeExporter commented 9 years ago
Oh by the way, there is a new -m / --md5 option that allows you to use MD5 if 
you really want to be sure that there can't be any collisions.

Original comment by Louwrentius on 23 Aug 2011 at 10:12

GoogleCodeExporter commented 9 years ago

Original comment by Louwrentius on 23 Aug 2011 at 10:12

GoogleCodeExporter commented 9 years ago

Original comment by Louwrentius on 25 Dec 2011 at 4:45