Closed tfrederiksen closed 4 months ago
We are starting to have a very big repo, how many of these files are actually used in the coming tests? If anything we could retain the input file, and only those that will be tested? Then later we can add other files, when needed. Agreed it isn't optimal, but there are limits to the amount of bandwidth we can pull from this repo...
The total addition here would be about 3MB. I can delete the largest files POTCAR
(identical in all these runs) and vasprun.xml
which would would save 278K and 94-350K, respectively, per directory.
But for future implementations and tests, wouldn't it be handy to have the whole output for such small test systems with different settings?
In this PR I see this size distribution:
$ du -hs tests/sisl/io/*
44K tests/sisl/io/dftb
6.2M tests/sisl/io/gulp
1.9M tests/sisl/io/orca
26M tests/sisl/io/siesta
85M tests/sisl/io/tbtrans
19M tests/sisl/io/vasp
352K tests/sisl/io/wannier90
What is "big" for this kind of repo?
The total addition here would be about 3MB. I can delete the largest files
POTCAR
(identical in all these runs) andvasprun.xml
which would would save 278K and 94-350K, respectively, per directory.But for future implementations and tests, wouldn't it be handy to have the whole output for such small test systems with different settings?
It would, but we are heading for more and more tests, so I am a bit more inclined to be pro-active rather than causing problems down the road. If anything, we can partially solve this by creating two PR's, one with the entire content, and one with the reduced content.
The other thing is that the tests are not for users to look into, they should be used for testing stuff. I haven't been consistent with this previously, but I can see it when we are running many tests that the bandwidth gets used. So my idea (going forward) would be to:
For the past month we've had ~175 test runs. Each running 2 (now 3) jobs. This amounts to (lets say 150 MB for the repo). 51 GB of bandwidth. Now that is only for the tests, there is also documentation and wheels creation.
All of these things scale quite drastically when using the CI for debugging.
Currently the billing tells me that I have used 22.41 GB out of 50 GB for a months bandwidth (I am paying 5$ per month for this). There are 18 days before the quota is reset.
Free CI is only free up to a certain limit...
OK, I see. So it is more the "current size" (rather than the repo history) that matters?
We could focus on the bigger files, some of them in ascii could be zipped.
Btw, there is one file that is by far the biggest: tests/sisl/io/tbtrans/1_graphene_all.TBT.nc
(84MB) which samples 100 k-points and 400 energy points. Maybe that one could be reconsidered?
Can one use symbolic links? One could then keep a single POTCAR somewhere?
Exactly, there are many files that needs revisions, I didn't think of this initially (I was the only one submitting stuff). But as PR's are getting more frequent (GOOD!!!) then it becomes more important.
I haven't had time to reduce them, but I would really like to. :)
I would still prefer not to have the POTCAR, or we could have another repo that contains the shared files (with the same file-structure). It now gets complicated...
Symlinks seems like the way to go
I've compressed the nitric_oxide
runs (close to) the minimum (now 6.9MB, mainly the six CHG* files). What about now?
Sorry for the late return.
Could you possibly add the README.md
files to complete this? Then I will merge, and move things over in the new branch.
Could you possibly add the
README.md
files to complete this? Then I will merge, and move things over in the new branch.
I'm not sure exactly what you would write. The directories are "complete", nothing special to be fetched.
Something like this:
each folder should have a small document describing what it contains. I.e. system information, which version of the code it was runned with, and which required files are also needed (and where to locate them)
What about now?
Ok, lets do this.
When I rework things, I will:
I think in terms of potcars etc. we should strive to complete the details of how to create them in the README files. E.g. write something like:
cat C/POTCAR O/POTCAR > POTCARS
so it is very clear.
Once I get to the VASP things, I'll probably create a pr and request your review ;)
Added some outputs containing lm- and atom-resolved density of states (PDOS) via the
LORBIT=10
tag.