Running segway identify for the whole human genome with default limits gives you thousands of jobs. This means that for log files that are generated once per file, we will get thousands in one directory. This causes performance problems for some filesystems and utilities.
We should split into multiple subdirectories when there are more than 1000 files. Maybe with subdirectories named 0, 1, 2, etc. I think it would be inconvenient to have this splitting when there are fewer than 1000 files. This would unfortunately introduce an inconsistent behavior depending on the number of jobs.
Original report (BitBucket issue) by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman).
Running
segway identify
for the whole human genome with default limits gives you thousands of jobs. This means that for log files that are generated once per file, we will get thousands in one directory. This causes performance problems for some filesystems and utilities.We should split into multiple subdirectories when there are more than 1000 files. Maybe with subdirectories named
0
,1
,2
, etc. I think it would be inconvenient to have this splitting when there are fewer than 1000 files. This would unfortunately introduce an inconsistent behavior depending on the number of jobs.