nomad-coe / electronic-parsers

Apache License 2.0
18 stars 7 forks source link

eDMFT parser #144

Closed JosePizarro3 closed 1 year ago

JosePizarro3 commented 1 year ago

Hi @ladinesa ,

Can you check on the initial version of this parser? I still have to work on a few things, but you can maybe take a look? Let me know what you prefer, if waiting or do a first revision now.

There are also some small changes in the metainfo and fixes in other parsers. Furthermore, there are some bugs in nomad I will take care before merging anything.

TODO:

JosePizarro3 commented 1 year ago

Some minor points for initial review. I will look at it in full once you make progress on your todos. Are the changes in the other parsers part of the mr?

Ok thanks. About the other parsers, these are minor changes to adapt to metainfo renaming, as well as clean up.

I have a branch in NOMAD, but I didn't finish it up. Now I have to define a new workflow (the one in mainfile_keys called DMFT_MaxEnt) for putting together the two SinglePoints I am defining here, so I will keep working on it.

https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/compare/develop...1450-embeddeddmft-parser-and-revisit-dmft-metainfo

JosePizarro3 commented 1 year ago

Ok, thank you very much for the initial comments. I will finish it up, and let you know.

One thing though, is that some testing is failing when encountering different types of quantities, e.g., int or np.int32. In the metainfo, it happens often that we have some quantities with the python native type, others with the numpy one. Just in case you think this should be treated more carefully or not.

ladinesa commented 1 year ago

Ok, thank you very much for the initial comments. I will finish it up, and let you know.

One thing though, is that some testing is failing when encountering different types of quantities, e.g., int or np.int32. In the metainfo, it happens often that we have some quantities with the python native type, others with the numpy one. Just in case you think this should be treated more carefully or not.

I thought that this has already been fixed before i.e. one can use python natives on np typed quantities. Anyways, I think we should do the conversion ourselves. For me, it is always safer to use python types for scalars and use numpy only for arrays.

JosePizarro3 commented 1 year ago

Ok, thank you very much for the initial comments. I will finish it up, and let you know. One thing though, is that some testing is failing when encountering different types of quantities, e.g., int or np.int32. In the metainfo, it happens often that we have some quantities with the python native type, others with the numpy one. Just in case you think this should be treated more carefully or not.

I thought that this has already been fixed before i.e. one can use python natives on np typed quantities. Anyways, I think we should do the conversion ourselves. For me, it is always safer to use python types for scalars and use numpy only for arrays.

The error 👇🏻

 self = nomad.datamodel.metainfo.simulation.calculation.Atomic.n_orbitals:Quantity
obj = Charges(kind, n_atoms), value = 5

    def __set__(self, obj, value):
        obj.m_mod_count += 1

        if value is None:
            obj.__dict__.pop(self.name, None)
            return

        # Handle pint quantities. Conversion is done automatically between
        # units. Notice that currently converting from float to int or vice
        # versa is not allowed for primitive types.
        if isinstance(value, pint.Quantity):
            if self.unit is None:
                raise TypeError(f'The quantity {self} does not have a unit, but value {value} has.')
            if self.type in MTypes.int:
                raise TypeError(
                    f'Cannot save data with unit conversion into the quantity {self} '
                    'with integer data type due to possible precision loss.')
            value = value.to(self.unit).magnitude

        if self._list:
            if not isinstance(value, list):
                if hasattr(value, 'tolist'):
                    value = value.tolist()
                else:
                    raise TypeError(f'The value {value} for quantity {self} has no shape {self.shape}')

            if any(v is not None and type(v) != self._type for v in value):
                raise TypeError(
                    f'The value {value} with type {type(value)} for quantity {self} is not of type {self.type}')

        elif type(value) != self._type:
>           raise TypeError(
                f'The value {value} with type {type(value)} for quantity {self} is not of type {self.type}')
E           TypeError: The value 5 with type <class 'numpy.int32'> for quantity nomad.datamodel.metainfo.simulation.calculation.Atomic.n_orbitals:Quantity is not of type <class 'int'>

In any case, I was using all time numpy for any numerical value (np.int32, np.float64, etc), as it seems to be the rule in the metainfo.

ladinesa commented 1 year ago

Ok, thank you very much for the initial comments. I will finish it up, and let you know. One thing though, is that some testing is failing when encountering different types of quantities, e.g., int or np.int32. In the metainfo, it happens often that we have some quantities with the python native type, others with the numpy one. Just in case you think this should be treated more carefully or not.

I thought that this has already been fixed before i.e. one can use python natives on np typed quantities. Anyways, I think we should do the conversion ourselves. For me, it is always safer to use python types for scalars and use numpy only for arrays.

The error 👇🏻

 self = nomad.datamodel.metainfo.simulation.calculation.Atomic.n_orbitals:Quantity
obj = Charges(kind, n_atoms), value = 5

    def __set__(self, obj, value):
        obj.m_mod_count += 1

        if value is None:
            obj.__dict__.pop(self.name, None)
            return

        # Handle pint quantities. Conversion is done automatically between
        # units. Notice that currently converting from float to int or vice
        # versa is not allowed for primitive types.
        if isinstance(value, pint.Quantity):
            if self.unit is None:
                raise TypeError(f'The quantity {self} does not have a unit, but value {value} has.')
            if self.type in MTypes.int:
                raise TypeError(
                    f'Cannot save data with unit conversion into the quantity {self} '
                    'with integer data type due to possible precision loss.')
            value = value.to(self.unit).magnitude

        if self._list:
            if not isinstance(value, list):
                if hasattr(value, 'tolist'):
                    value = value.tolist()
                else:
                    raise TypeError(f'The value {value} for quantity {self} has no shape {self.shape}')

            if any(v is not None and type(v) != self._type for v in value):
                raise TypeError(
                    f'The value {value} with type {type(value)} for quantity {self} is not of type {self.type}')

        elif type(value) != self._type:
>           raise TypeError(
                f'The value {value} with type {type(value)} for quantity {self} is not of type {self.type}')
E           TypeError: The value 5 with type <class 'numpy.int32'> for quantity nomad.datamodel.metainfo.simulation.calculation.Atomic.n_orbitals:Quantity is not of type <class 'int'>

In any case, I was using all time numpy for any numerical value (np.int32, np.float64, etc), as it seems to be the rule in the metainfo.

you can also put in the textparser quantity defintion e.g. dtype=np.int32 so it matches the type in the metainfo def. or better, instead of putting the name of the quantity, put the metainfo quantity itself e.g. Quantity(Calculation.pressure, r'pres=(.+)'). The latter will copy the metainfo quantiy info into the textparser quantity

JosePizarro3 commented 1 year ago

Ok @ladinesa , here you have the 2nd version. Now the parser is ready and I pushed the branch in NOMAD, creating also a MR (https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/merge_requests/1440).

A couple of remarks:

Regarding the TODO list, it is pretty much waiting for a reply by the authors, but I guess we can merge anyways and when they reply, I will implement the changes.

JosePizarro3 commented 1 year ago

Good morning @ladinesa , please, check out the changes with your comments. I think you can also check the NOMAD if you feel so (maybe it is interesting for you that I added extract_section in the utils there).

Thanks a lot for the review. And feel free to keep dropping feedback.