VASP lm- and atom-resolved density of states

tfrederiksen commented 5 months ago

Added some outputs containing lm- and atom-resolved density of states (PDOS) via the LORBIT=10 tag.

zerothi commented 5 months ago

We are starting to have a very big repo, how many of these files are actually used in the coming tests? If anything we could retain the input file, and only those that will be tested? Then later we can add other files, when needed. Agreed it isn't optimal, but there are limits to the amount of bandwidth we can pull from this repo...

tfrederiksen commented 5 months ago

The total addition here would be about 3MB. I can delete the largest files POTCAR (identical in all these runs) and vasprun.xml which would would save 278K and 94-350K, respectively, per directory.

But for future implementations and tests, wouldn't it be handy to have the whole output for such small test systems with different settings?

tfrederiksen commented 5 months ago

In this PR I see this size distribution:

$ du -hs tests/sisl/io/*
44K tests/sisl/io/dftb
6.2M    tests/sisl/io/gulp
1.9M    tests/sisl/io/orca
26M tests/sisl/io/siesta
85M tests/sisl/io/tbtrans
19M tests/sisl/io/vasp
352K    tests/sisl/io/wannier90

What is "big" for this kind of repo?

zerothi commented 5 months ago

The total addition here would be about 3MB. I can delete the largest files POTCAR (identical in all these runs) and vasprun.xml which would would save 278K and 94-350K, respectively, per directory.

But for future implementations and tests, wouldn't it be handy to have the whole output for such small test systems with different settings?

It would, but we are heading for more and more tests, so I am a bit more inclined to be pro-active rather than causing problems down the road. If anything, we can partially solve this by creating two PR's, one with the entire content, and one with the reduced content.

The other thing is that the tests are not for users to look into, they should be used for testing stuff. I haven't been consistent with this previously, but I can see it when we are running many tests that the bandwidth gets used. So my idea (going forward) would be to:

include the necessary input files (preferably omitting the pseudopotentials, but having some links/description on where to get them)
include only the outputs that are used.

For the past month we've had ~175 test runs. Each running 2 (now 3) jobs. This amounts to (lets say 150 MB for the repo). 51 GB of bandwidth. Now that is only for the tests, there is also documentation and wheels creation.

All of these things scale quite drastically when using the CI for debugging.

Currently the billing tells me that I have used 22.41 GB out of 50 GB for a months bandwidth (I am paying 5$ per month for this). There are 18 days before the quota is reset.

Free CI is only free up to a certain limit...

tfrederiksen commented 5 months ago

OK, I see. So it is more the "current size" (rather than the repo history) that matters?

We could focus on the bigger files, some of them in ascii could be zipped.

Btw, there is one file that is by far the biggest: tests/sisl/io/tbtrans/1_graphene_all.TBT.nc (84MB) which samples 100 k-points and 400 energy points. Maybe that one could be reconsidered?

tfrederiksen commented 5 months ago

Can one use symbolic links? One could then keep a single POTCAR somewhere?

zerothi commented 5 months ago

Exactly, there are many files that needs revisions, I didn't think of this initially (I was the only one submitting stuff). But as PR's are getting more frequent (GOOD!!!) then it becomes more important.

I haven't had time to reduce them, but I would really like to. :)

I would still prefer not to have the POTCAR, or we could have another repo that contains the shared files (with the same file-structure). It now gets complicated...

tfrederiksen commented 5 months ago

Symlinks seems like the way to go

tfrederiksen commented 5 months ago

I've compressed the nitric_oxide runs (close to) the minimum (now 6.9MB, mainly the six CHG* files). What about now?

zerothi commented 5 months ago

Sorry for the late return.

Could you possibly add the README.md files to complete this? Then I will merge, and move things over in the new branch.

tfrederiksen commented 5 months ago

Could you possibly add the README.md files to complete this? Then I will merge, and move things over in the new branch.

I'm not sure exactly what you would write. The directories are "complete", nothing special to be fetched.

zerothi commented 5 months ago

Something like this:

each folder should have a small document describing what it contains. I.e. system information, which version of the code it was runned with, and which required files are also needed (and where to locate them)

tfrederiksen commented 4 months ago

What about now?

zerothi commented 4 months ago

Ok, lets do this.

When I rework things, I will:

remove the symlinks
remove all POTCARS (I have read that there is a strict limitation on the max-size of the repo, and I don't want to revisit things later, this just have to work)

I think in terms of potcars etc. we should strive to complete the details of how to create them in the README files. E.g. write something like:

cat C/POTCAR O/POTCAR > POTCARS

so it is very clear.

Once I get to the VASP things, I'll probably create a pr and request your review ;)

zerothi / sisl-files

VASP lm- and atom-resolved density of states #13