nomad-coe / nomad

NOMAD lets you manage and share your materials science data in a way that makes it truly useful to you, your group, and the community.
https://nomad-lab.eu
Apache License 2.0
64 stars 14 forks source link

Parser failing for mixed DFT+DMFT data (most likely because of wannier90) #65

Closed deecadance closed 11 months ago

deecadance commented 1 year ago

Dear developers, I am testing the upload of DFT+DMFT files to NOMAD. The DFT part is done with Quantum ESPRESSO, then wannier90, and finally the DMFT part is done with w2dynamics. I'm having some issues with the parser, I hope this is the right place to request aid.

I have made a test run for NiO to play around with the upload. I put together all the data in a single folder and tried to upload via the web browser. When I upload everything together (with/without zipping it) the parses fails for all entries.

So I tried to upload files one by one.

The log error is the following.

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/nomad/processing/data.py", line 1156, in parsing
    parser.parse(self.mainfile_file.os_path, self._parser_results, logger=logger, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/nomad/parsing/parser.py", line 416, in parse
    self.mainfile_parser.parse(mainfile, archive, logger)
  File "/usr/local/lib/python3.9/site-packages/electronicparsers/wannier90/parser.py", line 506, in parse
    self.parse_system(self.archive, self.wout_parser)
  File "/usr/local/lib/python3.9/site-packages/electronicparsers/wannier90/parser.py", line 220, in parse_system
    sec_atoms.positions = structure.get('positions') * ureg.angstrom
  File "/usr/local/lib/python3.9/site-packages/pint/unit.py", line 187, in __mul__
    return self._REGISTRY.Quantity(1, self._units) * other
  File "/usr/local/lib/python3.9/site-packages/pint/quantity.py", line 1249, in __mul__
    return self._mul_div(other, operator.mul)
  File "/usr/local/lib/python3.9/site-packages/pint/quantity.py", line 115, in wrapped
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pint/quantity.py", line 95, in wrapped
    result = f(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pint/quantity.py", line 1215, in _mul_div
    magnitude = magnitude_op(self._magnitude, other_magnitude)
numpy.core._exceptions._UFuncNoLoopError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('int64'), dtype('<U8'))

I'll put here the wannier90 input/output: NiO_wannier90.zip

The whole folder after compression is 153 MB. If you would like to have the files and test for yourself I'm happy to send it, but I cannot post it here.

lauri-codes commented 1 year ago

Hi @deecadance,

Thanks for your report! We do have a separate repository for the parsers you mention here: https://github.com/nomad-coe/electronic-parsers. But for simplicity, we can continue the discussion in this issue.

The particular wannier90 issue seems to be related to the parsing of the atomic positions and should be fairly easy to solve. In order to track down why all entries in the upload would fail, we would need to look at the entire upload contents. Is it possible for you to create a minimal example that you could share as a link here? This would help us greatly.

@JosePizarro3: Maybe you already have an idea what might be going on?

deecadance commented 1 year ago

Thanks for the quick reply!

It seems that depending on which files I include/not include in the upload the parsing fails.

This combination of files. It is a wetransfer link so it will expire in a few days. I tried a few combinations that would point the blame at NiO.dos (the file with the density of states).

I tried the following tests:

This makes me think that I ran into two different problems that looked like one. The misreading of Wannier90 causes the DMFT parser to fail, while a problem with NiO.dos causes the Quantum ESPRESSO ones to fail.

I hope it helps. Simone

JosePizarro3 commented 1 year ago

Hello @deecadance ,

First of all, thanks a lot for starting using NOMAD for DMFT πŸ™‚ This, despite the bugs you encountered, is very exciting.

Then, I remember both bugs, as one of them (the one relating with Wannier90 and that you initially shared) is already solved.


For the Wannier90 bug.

The difference is that you are using the more stable version (normal, as you clicked the "Open NOMAD" button in the web), instead of the staging or beta version (this is further down in the web, called "Beta/Staging; here is the link). The difference amongst both version is a few weeks of update.

In general, I will recommend you using the Beta by default. Even if it is more prone to errors, some fixes are reflected there sooner than in the stable version. In there, the problem for Wannier90 was solved. Then *.wout and the other Wannier90 files individually should be correctly parsed.


For the problem of all files at the same time.

I think this issue has been solved already, but the version is for some reason not yet in Beta. @lauri-codes in my develop I can parse this data and I don't have this issue (parsing seems "correct", see below), but I remember some days ago (soon after updating to Python3.9) this was happening.

Now, anyways, QE seems to be failing to resolve system.atoms, and hence fails in normalizations a posteriori of parsing. I can investigate this with @ladinesa and push quickly a patch.


That's all from me from now; again I will investigate the QE problems.

Thanks again, Simone, and have a nice weekend. Jose

deecadance commented 1 year ago

Thanks both for the quick replies. I will try again next week in the beta branch then.

lauri-codes commented 1 year ago

@JosePizarro3: Thanks for picking this up. Let me know if you need help with the QE issue. Once we have it fixed we can deploy a new version to beta.

ladinesa commented 1 year ago

I managed to fix the qe parser for the system. For w2dynamics, the problem as it stands now is that the archive is too big for the current infrastructure to handle. I already asked @JosePizarro3 to work on a solution, i.e. not to dump large arrays to the archive.

JosePizarro3 commented 1 year ago

Hello @deecadance ,

I finished optimizing w2dynamics so the size of the archive is not so restrictive. Both @ladinesa and my changes are now in the develop version of NOMAD, and I will contact the main coordinator to push this to the Beta version (https://nomad-lab.eu/prod/v1/staging/gui/search/entries); once this is there, you can hit the button re-process and see what comes out.

This should not take more than 1-2 days, so you could come back next Monday to the link I mentioned and reprocess your upload.

If you encounter more errors, don't hesitate to come back here to us. If everything works, we can close this issue once you test it.

Thanks a lot.

deecadance commented 1 year ago

Hi! Monday update for you. I tried the upload again with both the beta version and your linked version. The wannier90 and Quantum ESPRESSO files are parsed succesfully, while the .hdf5 file from w2dynamics does not.

If I upload everything (as a .zip file) I get the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/nomad/processing/data.py", line 1153, in parsing
    parser.parse(self.mainfile_file.os_path, self._parser_results, logger=logger, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/nomad/parsing/parser.py", line 396, in parse
    self.mainfile_parser.parse(mainfile, archive, logger)
  File "/usr/local/lib/python3.9/site-packages/electronicparsers/w2dynamics/parser.py", line 428, in parse
    self.parse_scc()
  File "/usr/local/lib/python3.9/site-packages/electronicparsers/w2dynamics/parser.py", line 321, in parse_scc
    value = parameter.get('value')[:]
AttributeError: 'NoneType' object has no attribute 'get'

If I upload the .hdf5 file by itself I get no error.

In principle things could be uploaded as separate entries. The reason to upload everything together is that the DMFT calculation is done for a real material (NiO in this case, five Ni-d orbitals near the Fermi energy). Since it's a DFT+DMFT calculation I figured that one would like to have the DMFT result together with DFT. Is that the intended use?

Cheers, Simone

JosePizarro3 commented 1 year ago

Hi Simone,

Thanks again. Indeed, I am getting the same error when I play a bit with the order of files in the folder. Let us investigate it and come back to you soon.

Sorry for the inconvenience.

In principle things could be uploaded as separate entries. The reason to upload everything together is that the DMFT calculation is done for a real material (NiO in this case, five Ni-d orbitals near the Fermi energy). Since it's a DFT+DMFT calculation I figured that one would like to have the DMFT result together with DFT. Is that the intended use?

You are right. The intended used is to upload all the data at once, and not having to separate one by one each type of calculation. In fact, my initial thought on how an ideal upload folder for DFT+Wannier+DMFT should look like would be something like:

β”œβ”€β”€ vasprun.xml (or any other DFT code being used)
β”œβ”€β”€ extra DFT files
β”œβ”€β”€ wannier.wout
β”œβ”€β”€ extra Wannier90 files
β”œβ”€β”€ dmft1 hdf5 file (one set of parameters)
β”œβ”€β”€ dmft2 hdf5 file (another set of parameters)
... and all the dmft files / subfolders you want

I think this structure is easy to grab by the parsers (besides this error that now popped up).

So, let me reach you back soon with the patch, and all the best, Jose

JosePizarro3 commented 1 year ago

Hi @deecadance ,

I was not able to reproduce the error in the beta version. I named you co-author of the data I uploaded to NOMAD so you see it (you can check this in your upload page for upload_id=I2bPTfOpT8u3wVUK9W4Gvg). I simply dragged into the upload page the files you shared with us.

I also have a couple of screenshots, but cannot share them here.

Let me know if now is working for you.

deecadance commented 1 year ago

Thanks. So, for the data structure I think you're on the right track. Personally I do a DFT + Wannier calculation to get the Hamiltonian, fix U and t, and create a series of subfolders for different filling (number of electrons) and, inside those, a different subfolder for each temperature.

I still get the error, but I also have a plausible idea on why I get an error and you don't. The files I sent you before were a "minimal example" for you to reproduce the previous error (so I removed files that were not causing problems to reduce confusion). I was now trying to upload the whole folder, so maybe some of the files I previously excluded now cause problems? They're not files that are meant to be parsed, but they're still useful stuff like the Hamiltonian in real/reciprocal space that would be helpful if someone tries to reproduce the results.

I'll send you the exact zipped file I am uploading: https://we.tl/t-7WCs49p4UT Could you try to upload this one and see if you get the error?

Thanks! Simone

JosePizarro3 commented 1 year ago

Hi Simone,

Ok, now the bug is clear, but I managed to push a fix :partying_face:

FYI, I was rewriting the hdf5 file, hence the error (which was essentially that the parameters read from the hdf5 file didn't exist...). Sorry about that. After the beta is updated, the upload should work (I tried these files locally and it works).

All the best, Jose

P.S.: if you are interested, and while waiting, we can further talk by email (jose.pizarro@physik.hu-berlin.de) about how to define workflows in NOMAD-lab. This is working and it will be interesting for you if you want to connect all the steps in your calculations and visualize this in a very nice way :slightly_smiling_face: Feel free to write me an email if you want to know more.

deecadance commented 1 year ago

Hurray! Thanks a lot.

deecadance commented 1 year ago

Another week, another error!

I tried to upload a much bigger batch of DMFT calculations (different pressures, different temperatures).

Congratulations, most of them are processed just fine! A minority however is not (34 entries out of about 300). It looks like the hdf5 file is the problem. It's odd, since all these calculations are done in the same way. Also, my own post-processing routines work, so the file is not corrupted.

Can you try this one out? I'm sending the file down here https://we.tl/t-2tnEmZVhGX

Have a nice weekend, Simone

JosePizarro3 commented 1 year ago

Hi Simone,

Thanks again for keep the testing up. I will quickly fix it next week.

For what I saw, the hdf5 file for each iteration step contains now a couple of keys ('ubar' and 'jbar') that are giving issues. I will then include them in the parsing, and improve error handling for future potential issues like this one.

Have a nice weekend, Jose

JosePizarro3 commented 1 year ago

Ok, fix was merged. Thanks again, and once more, it will be only a matter of time to see these changes reflected in the Beta.

I also tried all the data I had, and everything passes with success, so let me know if you have some extra problematic files.

Best, Jose