irreducible-representations / irrep

GNU General Public License v3.0
59 stars 30 forks source link

(VASP) Unfolding issue with irrep-1.9.1 #73

Closed bmondal94 closed 1 month ago

bmondal94 commented 2 months ago

Dear Admin,

I am using irrep-1.9.1 with banduppy to unfold the band structure from VASP WAVECAR file. However, I am encountering a few issues:

  1. The unfolding process is significantly slower compared to irrep-1.8.3.
  2. I receive a non-integer warning for each state: WARNING - non-integer number of states: 411.4891663397381.
  3. Consequently, the unfolding results are incorrect, with all the Bloch weights accumulating at the top bands.

Do you have any suggestions to address these issues?

Best regards,
Badal

unfolded_bandstructure_

stepan-tsirkin commented 2 months ago

Hi @bmondal94 , really looks like something's broken. Just to be clear, is it true that with irrep-1.8.3 things work as expected?

second, is it only with VASP, or similar problem arises with other code (e.g. QE or abinit)

@MIraola , any ideas?

bmondal94 commented 2 months ago

Dear Admins,

  1. Yes. irrep-1.8.3 works perfectly as expected. All my tested unfolded band structures came out as expected.
  2. I have only tested with VASP so far. Unfortunately, at the moment I do not have access to QE or Abinit. So, I can only confirm that the problem is with VASP.
stepan-tsirkin commented 2 months ago

Hi @bmondal94 ,

As I understand, there is no advantage in using irrep-1.9 or newer with banduppy for now, right?

In that case I would suggest that in banduppy for now you you fix in requirements that irrep<=1.8.3, and also when running banduppy raise a warning if the installed version of irrep happens to be newer.

After we resolve this compatibility and performance problem, we remove this restriction in banduppy.

MIraola commented 2 months ago

Hi Badal,

thank you for your report. Unfortunately, this week I will not be able to fix the problem because I am very busy with preparing some documents. I will look at the problem at the beginning of next week. Meanwhile, please install irrep version <= 1.8.3 as Stepan suggested.

Sorry for the inconvenience.

Best, Mikel I.

El jue, 11 jul 2024 a las 14:29, Stepan Tsirkin @.***>) escribió:

Hi @bmondal94 https://github.com/bmondal94 ,

As I understand, there is no advantage in using irrep-1.9 or newer with banduppy for now, right?

In that case I would suggest that in banduppy for now you you fix in requirements that irrep<=1.8.3, and also when running banduppy raise a warning if the installed version of irrep happens to be newer.

After we resolve this compatibility and performance problem, we remove this restriction in banduppy.

— Reply to this email directly, view it on GitHub https://github.com/stepan-tsirkin/irrep/issues/73#issuecomment-2222808213, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJUEUPHSFU2BFLOEI4RF2JDZLZ3BLAVCNFSM6AAAAABKQIV3OOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRSHAYDQMRRGM . You are receiving this because you were mentioned.Message ID: @.***>

stepan-tsirkin commented 2 months ago

@bmondal94 , regarding the issue that the code became slower. Could you run the unfolding with

python -m cProfile script.py

both with irrep-1.9.1 and 1.8.3, and attach the outputs here, to identify the new bottlenecks

stepan-tsirkin commented 2 months ago

I think I know where the issue hides.

In this line of Kpoint.__init__()

https://github.com/MIraola/irrep/blob/a2d50b9b2daa794044576d9bbeac2b72a9f2c5f2/irrep/kpoint.py#L201-L202

the traces are calculated (which are not actually needed for unfolding). But in addition self.Energy is substituted by Energy_mean

https://github.com/MIraola/irrep/blob/a2d50b9b2daa794044576d9bbeac2b72a9f2c5f2/irrep/kpoint.py#L600-L602

which is returned by calculate_traces

https://github.com/MIraola/irrep/blob/a2d50b9b2daa794044576d9bbeac2b72a9f2c5f2/irrep/kpoint.py#L615

Now, when Kpoint.unfold() calls for self.get_rhi_spin here

https://github.com/MIraola/irrep/blob/a2d50b9b2daa794044576d9bbeac2b72a9f2c5f2/irrep/kpoint.py#L310 The energies are still assumed not-averaged

So, in these borders

https://github.com/MIraola/irrep/blob/a2d50b9b2daa794044576d9bbeac2b72a9f2c5f2/irrep/kpoint.py#L366-L372

no degeneracies are detected, because the degenerate states are already grouped.

Now, for simplicity assume that each band is doubly-degenerate, i.e. we read energies as [E1,E1, E2,E2,E3E3,...] and wavefunctions [WF1_1, WF1_2, WF2_1, WF2_2, ... ]

Now after averaging the Energies become [E1,E2,E3], but the wavefunctions do not change. So, __eval_rho_spin considers the pairs (energy,wavefunction) as

[ (E1, WF1_1) , (E2, WF1_1) , (E3, WF2_1) , (E4, WF2_2), ...] 

So, a higher energy is attributed to each wavefunction, therefore all unfolding weights tend to be at the top of the figure.

This issue is not VASP-related, it should be generall for codes, as long as there are some degeneracies in the bands.

So, question - how to solve it? I think, we should keep both Energy_raw, and Energy_mean.

On the other hand, what I do not like in the fact that traces are calculated in Kpoint.__init__() is that they are calculated once for a fixed threshold, and cannot be recalculated .

So, what is the point of the getter

https://github.com/MIraola/irrep/blob/a2d50b9b2daa794044576d9bbeac2b72a9f2c5f2/irrep/kpoint.py#L320 if anyway, it may be called only once, with one threshold ?

Also, I recently learned that those getters are useless, they can be replaced by @functools.lru_cache decorator

stepan-tsirkin commented 2 months ago

@bmondal94 ,

I tried to fix the problem on this branch, could you try it:

https://github.com/stepan-tsirkin/irrep/tree/fix-issue-73

bmondal94 commented 2 months ago

Dear Stepan,

First of all, I like the explanation on why the Bloch weights were appearing on the top bands.

The fix-issue-73 branch is working as expected. In terms of unfolding time, it is also fast similar to irrep-1.8.3. However, I want to point out a few points:

  1. I noticed that there have been modifications made to variable names in latest irrep. For example, in the Kpoints module's unfold() function, self.K has been changed to self.k. This breaks BandUPpy, as we were previously accessing the unfolded k-points using .K. I have updated our code, but I want to highlight that changing the names of especially the self parameters can break the irrep-dependent software if they access these self parameters for processing. With this update, the unfolding is working, but I need to re-check all other irrep self parameters we are accessing in BandUPpy to be sure.

  2. With the fix-issue-73 branch the Bloch weights in the unfolded band structure have slightly changed (in comparison to irrep-1.8.3) in floating-point precision, although mostly in the 4th-8th decimal places, which may not be significant.

Best regards, Badal

stepan-tsirkin commented 2 months ago

Hi, If I am correct, we access the Kpoint.K attribute only here in banduppy

https://github.com/band-unfolding/banduppy/blob/67728db444fbc2e8dddd6a6bcb95116d277248a6/banduppy/src/unfolding.py#L163

             if is_round(KP.K - self.kpointsSBZ[key, :3], prec = 1e-6):

I suggest that for now in irrep we keep a wrapper property K to keep compatibility:

https://github.com/stepan-tsirkin/irrep/blob/4c592e7e90abe39c61f1d6bfc65c169c81206e4e/irrep/kpoint.py#L226-L246

also the k_close_mod1 function is added to Kpoint, that can de accessed by future version of banduppy directly. This way we both avoid using .K/.k properties, and the import of the auxiliary function is_round from irrep.

@bmondal94 , could you list all the properties/functions in irrep that are accessed directly by banduppy. We may put there a warning which should read smth like "ACCESSED DIRECTLY BY BANDUPPY. PLEASE DO NOT CHANGE WITHOUT NOTIFYING ITS DEVELOPERS"

Well, maybe it is better to put it as a comment, rather than in a docstring?

Also, a good idea is to add a short test (to irrep tests) which uses banduppy, so that any compatibility-breaking change will not be unnoticed.

bmondal94 commented 2 months ago

Hi @stepan-tsirkin, Just to let you know, I will be quite busy this week. I may not be able to get back to you guys with the requirements you mentioned. I will try my best to come back to you with a positive response ASAP.

MIraola commented 1 month ago

Hi,

1- I think it's a good idea to label the attributes that banduppy accesses directly, as Stepan suggested. I think that the section of "Attributes" in the docstring of the class is the correct place to do it, since if we add a comment in a line of the code then we have to check all instances of the attribute before changing the name.

2- Originally, we used to calculate the traces only for a single energy threshold. This is convenient if we want to get rid of wave functions; in order to recalculate the traces, we need to save the wave functions. I will tackle this point once we implement this branch in the code (probably, I will do it by creating a flag save_wf, since this possibility of recalculating the traces is more likely to be used when BandStructure or Kpoints are called in a jupyter-notebook environment).

stepan-tsirkin commented 1 month ago

I think that the section of "Attributes" in the docstring of the class is the correct place to do it,

:+1:

probably, I will do it by creating a flag save_wf

:+1: yes, it is better to control it with an explicit tag like that

bmondal94 commented 1 month ago

Dear @stepan-tsirkin @MIraola , Thanks a lot for looking into the issues. Here is the collections of irrep functions that banduppy uses:

Notes:

  1. Update require in Kpoint.k_close_mod1(self, kpt, prec=1e-6) return. ==> return is_round(self.k - kpt, prec = prec)

FYI: is_round() is called within Kpoint.unfold as well which can be replaced with k_close_mod1() functions.

  1. There is a print statement in gvectors.calc_gvectors() which is called by bandstructure.BandStructure. How about a bit more information about what it is printing? If the print statement is kept there we can do like -
    if N % 10 == 0:
    print(f'cycle-{N:>3d}: No. of plane waves = {len(igall):>10d}')
  2. Suggestion (not significant): I noticed self.RecLattice is calculated when spacegroup.SpaceGroup is called. Within bandstructure.BandStructure self.RecLattice is again calculated despite spacegroup.SpaceGroup is called earlier within bandstructure.BandStructure. Recalculating self.RecLattice can be avoided here.

===============================================

class Kpoint:
    @property
    def K(self):
        """Getter for the redfuced coordinates of the k-point
        needed to keep compatibility with banduppy<=0.3.3.

        ACCESSED DIRECTLY BY BANDUPPY<=0.3.3. 
        BANDUPPY>=0.3.4 NO LONGER USES IT.
        - banduppy.src.unfolding._BandUnfolding._gather_generated_ab_calculated_kpts()
        DO NOT CHANGE UNLESS NECESSARY.
        PLEASE NOTIFY THE DEVELOPERS IF ANY CHANGES ARE MADE.
        """
        return self.k

    def k_close_mod1(self, kpt, prec=1e-6):
        """
        Check if the k-point is close to another k-point modulo 1. (in reduced coordinates)

        ACCESSED DIRECTLY BY BANDUPPY>=0.3.4. 
        - banduppy.src.unfolding._BandUnfolding._gather_generated_ab_calculated_kpts()
        DO NOT CHANGE UNLESS NECESSARY.
        PLEASE NOTIFY THE DEVELOPERS IF ANY CHANGES ARE MADE.

        Parameters
        ----------
        kpt : array
            Coordinates of the k-point to compare.
        prec : float, default=1e-6
            Threshold to consider the k-points as equal.
        """
        return is_round(self.k - kpt, prec = prec)

class BandStructure:
    def __init__(...):
        """
        Parses files and organizes info about the whole band structure in 
        attributes. Contains methods to calculate and write traces (and irreps), 
        for the separation of the band structure in terms of a symmetry operation 
        and for the calculation of the Zak phase and wannier charge centers.

        'self.kpoints' IS ACCESSED DIRECTLY BY BANDUPPY>=0.3.4. 
        - banduppy.src.unfolding._BandUnfolding._unfold_bandstructure()
        DO NOT CHANGE UNLESS NECESSARY.
        PLEASE NOTIFY THE DEVELOPERS IF ANY CHANGES ARE MADE.
        """
        .
        .
        .
        kp = Kpoint(...)
        self.kpoints.append(kp)

    def KPOINTSline(self, kpred=None, supercell=None, breakTHRESH=0.1):
            """
            Calculate cumulative length along a k-path in cartesian coordinate.

            o k_vec = u.b1 + v.b2 + w.b3
            o e.g. (u,v,w)=(1/2, 0, 0) -> k_vec==(kx, ky, kz)
            o [kx ky kz] = [u v w].[[b11 b12 b13], [b21 b22 b23], [b31 b32 b33]]

            ACCESSED DIRECTLY BY BANDUPPY>=0.3.4. 
            - banduppy.src.unfolding._BandUnfolding._unfold_bandstructure()
            DO NOT CHANGE UNLESS NECESSARY.
            PLEASE NOTIFY THE DEVELOPERS IF ANY CHANGES ARE MADE.

            Parameters
            ----------
            kpred : list, default=None
                Each element contains the direct coordinates of a k-point in the
                attribute `kpoints`.
            supercell : array, shape=(3,3), default=None
                Describes how the lattice vectors of the (super)cell used in the 
                calculation are expressed in the basis vectors of the primitive 
                cell. 
            breakTHRESH : float, default=0.1
                If the distance between two neighboring k-points in the path is 
                larger than `break_thresh` break continuity in k-path. Set breakTHRESH 
                to a large value if the unfolded kpoints line is continuous.
                The default is 0.1.

            Returns
            -------
            K : array
                Each element is the cumulative distance along the path up to a 
                k-point. The first element is 0, so that the number of elements
                matches the number of k-points in the path.
            """
            if kpred is None:
                kpred = [k.k for k in self.kpoints]

            if supercell is None:
                reciprocal_lattice = self.RecLattice
            else:
                reciprocal_lattice = supercell.T @ self.RecLattice 

            KPcart = np.dot(kpred, reciprocal_lattice) 
            K = np.zeros(KPcart.shape[0])
            k = np.linalg.norm(KPcart[1:, :] - KPcart[:-1, :], axis=1)
            k[k > breakTHRESH] = 0.0
            K[1:] = np.cumsum(k)
            return K
MIraola commented 1 month ago

Thank you for this info @bmondal94 . Definitely, it is very useful. I will implement the required changes immediately.

bmondal94 commented 1 month ago

@MIraola One more addition to earlier post. There is a 'warning' msg printed during unfolding:

Neither refUC nor shiftUC were specified in CLI and searchcell was False.
Taking 3x3 identity matrix as refUC and shiftUC=(0,0,0). 
If you want to calculate the transformation to conventional cell, run IrRep with -searchcell ???

The msg is originating from irrep.spacegroup.SpaceGroup.determine_basis_transf(). This is probably not of any concerns but does it make sense to suppress this msg during unfolding? Otherwise, this may cause confusion for banduppy users.

MIraola commented 1 month ago

Hello @bmondal94 ,

I have implemented the required changes in the branch "compatibility_banduppy" of my fork. I would be grateful if you could test if it works with banduppy.

In particular, motivated by your last comment, I have implemented a choice of verbosity for the prints. Could you check if it still prints the 'Neither refUC nor shiftUC were specified...' message when you call to BandStructure or SpaceGroup in Banduppy? It should not, unless you explicitly pass the argument v>=1 to these classes.

bmondal94 commented 1 month ago

Hi @MIraola , Great. But....a few points.

  1. In irrep.bandstructure.BandStrucre.KPOINTSline we are getting wrong conversion. It should be

    def KPOINTSline(self, kpred=None, supercell=None, breakTHRESH=0.1):
    if kpred is None:
        kpred = [k.k for k in self.kpoints]
    
    if supercell is None:
        reciprocal_lattice = self.RecLattice
    else:
        reciprocal_lattice = supercell.T @ self.RecLattice 
    
    # Convert to cartesian using primitive cell reciprocal lattice
    KPcart = np.dot(kpred, reciprocal_lattice) 
    K = np.zeros(KPcart.shape[0])
    ...
    return K

    In the current implementation KPcart is still using the Supercell reciprocal lattice.

  2. Nice verbosity implementation. The warning msg is not printed any more. Although we still have one msg. Here is the output I am getting when calling irrep BandStructure module from banduppy:

    Efermi: 0.0000 eV

    This msg is coming from line 260 in irrep.bandstructure.py.

  3. Note: latest irreptables is not installing. As long as version remains the same it is not upgrading when upgrading or force-reinstall irrep.

  4. Probably one can also pass verbosity tag in irrep.spacegroup.SpaceGroup line 537

  5. There is a new deprecation warning from spglib-2.5.0:

    .../python3.12/site-packages/spglib/spglib.py:115: DeprecationWarning: dict interface (SpglibDataset['rotations']) is deprecated.Use attribute interface ({self.__class__.__name__}.{key}) instead
MIraola commented 1 month ago

Hi @bmondal94 ,

  1. I have changed KPOINTSline accordingly.
  2. I have set the printing of Efermi to verbosity level 1. Originally I left it for all verbosity levels because I think it is helpful in every case, but maybe it is a bit annoying if IrRep is used as a package.
  3. The tables are distributed together with IrRep, but they form a separate project. So they have to be updated separately. Nevertheless, we have not changed the tables in a while. I guess that this was meant to use the tables in other codes without the need to add IrRep as a dependency; @stepan-tsirkin , shall we distribute the tables exclusively with IrRep?
  4. Fixed.
  5. Fixed. It was due to a modification in spglib v2.5.0 (most recent one). It didn't show up in my case because my installation was running an older version of spglib.

Let me know if any other point comes to your mind.

stepan-tsirkin commented 1 month ago

I originally packed irreptables as a separate package just to avoid uploading them (the data) every time that we update irrep. I was foreseeing that this package would be changed very rarely, and it happened to be the case.

If their interface needs to be updated - just increment their version and specify it as a minimum in the requirements of irrep.

So far I do not know if tables are used anywhere outside irrep.

So, if you think that in future development the irreptables module will be updated more often, consider packaging it into irrep directly.

MIraola commented 1 month ago

I see, thanks. Then I will just update the version number of tables together with the version number of the code in the final commit of this issue, once Badal gives the ok.

bmondal94 commented 1 month ago

Hi @MIraola , @stepan-tsirkin Perfect... Great effort... Thanks... Banduppy test passed successfully... Looking forward to new release...

MIraola commented 1 month ago

Thank you for your indications. I think we can close the issue now, and release v1.9.3.