vigna / webgraph-rs

A Rust port of the WebGraph framework
Apache License 2.0
29 stars 6 forks source link

LLP does not remove temporary labels_XX.bin files when done #101

Closed zacchiro closed 5 months ago

zacchiro commented 5 months ago

after running this:

time RUST_MIN_STACK=100000000 TMPDIR=`pwd`/tmp/ webgraph llp -g "-1,-2,-3,-4,-5,0-0" -j 72 /poolswh/softwareheritage/vlorentz/datasets/2023-09-06-recompressed/compressed/graph-bfs-simplified llp.perm &> llp.log

I noticed that temp files have not been cleaned from the local temp dir:

$ du -sh tmp 
1,5T    tmp
$ ls -l tmp 
total 1600507005
-rw-rw-r--+ 1 szacchiroli tss 272972530072 apr 11 17:27 labels_0.bin
-rw-rw-r--+ 1 szacchiroli tss 272972530072 apr 11 20:29 labels_1.bin
-rw-rw-r--+ 1 szacchiroli tss 272972530072 apr 11 23:24 labels_2.bin
-rw-rw-r--+ 1 szacchiroli tss 272972530072 apr 12 02:52 labels_3.bin
-rw-rw-r--+ 1 szacchiroli tss 272972530072 apr 12 06:59 labels_4.bin
-rw-rw-r--+ 1 szacchiroli tss 272972530072 apr 12 15:15 labels_5.bin

@progval have you ever encountered this? I'm surprised we didn't notice before… (maybe because you independently cleanup the same dir in the swh-graph pipeline?)

progval commented 5 months ago

I don't, I just don't usually look in my tmp directory

zacchiro commented 5 months ago

Yeah, confirmed: this can be reproduced trivially even with ./pipeline.sh tests/data/cnr-2000 which leaves 11 labels.bin file in /tmp, so it doesn't even depend on TMPDIR. And indeed there is no code removing them in src/algo/llp/mod.rs at all. I'll file a PR about this.