we used to let the thing that made the jobs move data to a temp bucket and then delete them when all the batch jobs were done. we flip that around about. now we just enumerate the various time tiles as jobs and the workers themselves clean up the data that they find in those time tiles when they run. this should schedule a lot more jobs and has the benefit of now wiping the data until we are sure we uploaded the result to s3. previously we were somehow getting in the state that the worker would wake up and not see the data that the job maker said would be there.
we used to let the thing that made the jobs move data to a temp bucket and then delete them when all the batch jobs were done. we flip that around about. now we just enumerate the various time tiles as jobs and the workers themselves clean up the data that they find in those time tiles when they run. this should schedule a lot more jobs and has the benefit of now wiping the data until we are sure we uploaded the result to s3. previously we were somehow getting in the state that the worker would wake up and not see the data that the job maker said would be there.