Open sylvestre opened 10 months ago
I would like to fix this. Please assign this issue to me. Thanks.
@4t8dd sure, go ahead :) While @allaboutevemirolive opened a PR, they are currently not working on it according to https://github.com/uutils/coreutils/pull/5738#issuecomment-1870965171:
I have an exam coming up in the first month of 2024. Please feel free to create a new PR and close this one. If no new PR is issued, I will investigate this matter in my spare time. :)
@4t8dd sure, go ahead :) While @allaboutevemirolive opened a PR, they are currently not working on it according to #5738 (comment):
I have an exam coming up in the first month of 2024. Please feel free to create a new PR and close this one. If no new PR is issued, I will investigate this matter in my spare time. :)
OK. Thanks for your reply. I will try it. @allaboutevemirolive if you got the solution, let me know.
Here are my thoughts on this problem:
The issue is in coreutils/src/uu/sort/src/merge.rs
, specifically in this line:
std::fs::copy(file_path, ©_path).map_err(|error| SortError::OpenTmpFileFailed { error })?;
...because in the bash script above, we pass the -m
flag to merge the files.
I archived the text below because it's completely unrelated. The original and destination paths are different. However, the note in it could be useful in the future.
You can try looking at cp.rs
to see how copying is done. I can't test it now, but I will when I have time.
That's first.
The second thought is that by only addressing the problem above, you can solve the next problem, which is Too many open files
since sort
throws two errors here.
sort: failed to open temporary file: Too many open files
Another thing is that my thoughts above might be wrong since increasing ulimit
also works, but it is not a clean way to address this issue.
@allaboutevemirolive Thanks for your sharing.
I did yet start to read the code. But I do have some investigatios about ulimit:
I got some tests on linux, mac and also have some comparision with sort from linux and mac. I got some interesting findings.
mac: ulimit -n 14 will work. With my test, 14 should be the least num to work for batch size = 7. But when I set batch_size to 700000 and it also works. This is crazy!!. And if I set n >= 14, batch size can be any number and it would always work. Oh~~
Linux: default opened files are 25. when set batch size to 7, n at least 10. Or I will get
sort: --batch-size argument ‘7’ too large
sort: maximum --batch-size argument with current rlimit is 6
This is interesting. If I set batch size to 6 then batch size could be 9. So there is a formula here: n = batch_size + 3 Then when I try to set large number to batch size its behaviour is deterministic. It follows this formula strictly. It does not only give the error but also gives the maxium size of batch size like:
sort: --batch-size argument ‘700000’ too large
sort: maximum --batch-size argument with current rlimit is 397
Here I set n to 400 and batch size to 700000.
For coreutils in Linux and Mac.
It got the same behaviour. n at least should be 16. But batch size can be any number now. And if n >= 16 you can set batch size to any number. This is very bad.
#!/bin/bash
mkdir -p in
for i in $(seq 5); do
echo $i >in/$i
done
(seq 6 && echo 6) >exp
echo 6 >out
(
exec 3<&- 4<&- 5<&- 6<&- 7</dev/null 8<&7 9<&7 &&
ulimit -n 16 &&
lsof -p $$ | wc -l &&
sort -n -m --batch-size=7000000 -o out out in/1 in/2 in/3 in/4 in/5 out
)
So linux sort should be the expected one. Even I did not know why the formula works for linux. The behaviour should be determinitic anyway. I will read the code then to find out the tricks.
I tried to see the source code again that created tempfile
to perform the copying operation and read the tempfile documentation. I do think the current tempfile wrapper (because it uses the tempdir operation
to create a temporary file, not the tempfile operation
) in coreutils
has resource leaks, so it didn't close the tempfile properly after being out of scope.
That's why rust compiler emits the Too many open files
error
sort: failed to open temporary file: Too many open files
I'm not sure why a tempfile is needed in the first place to perform merging or copying. Maybe, if you have time, you can try tweaking the code so it uses proper tempfile operations.
Quote from tempfile docs:
tempfile will (almost) never fail to cleanup temporary resources.
However TempDir and NamedTempFile will fail if their destructors don’t run.
This is because tempfile relies on the OS to cleanup the underlying file, while TempDir and NamedTempFile rely on rust destructors to do so.
Destructors may fail to run if the process exits through an unhandled signal interrupt (like SIGINT), or if the instance is declared statically (like with [lazy_static](https://github.com/rust-lang-nursery/lazy-static.rs/issues/62)), among other possible reasons.
I think I figure this out. In GNU sort:
merge at most NMERGE inputs at once;\n\
for more use temp files\n\
But here, we use more temp files.
If we should align with gnu, then more code would requires be changed. And sort in Mac behaves differently. But our sort behave the same.
So my solution here is: leave user for the ulimit and batch size work. we just need to tell if it is good to move on which is currently missing.
n = batch_size + 3
I guess the 3 already opened files are stdin, stdout, stderr.
Hello there, after investigating the test tests/sort/sort-merge-fdlimit.sh
, it appears that the error arises from replace_output_file_in_input_files
in merge.rs
.
We should try chunking the iterator of files first, then call this function on each chunk to ensures atleast one temp file can be created per chunk.
With our implementation:
returns:
sort: failed to open temporary file: Too many open files
GNU works without issueTested by GNU in tests/sort/sort-merge-fdlimit.sh