ukwa / ukwa-manage

Shepherding our web archives from crawl to access.
Apache License 2.0
10 stars 5 forks source link

MrJob submitter should allow STDIN, skip blank lines in filenames #95

Open anjackson opened 2 years ago

anjackson commented 2 years ago

Just failed to run a job because the input file list had an empty newline in it, which got translated into an hadoop -lsr / which is extremely wrong as well as very slow.

When processing items, empty/whitespace-only entries should skipped.

anjackson commented 2 years ago

Can be bundled with #96 as both tasks involve how we handle inputs.