nomad-coe / nomad

NOMAD lets you manage and share your materials science data in a way that makes it truly useful to you, your group, and the community.
https://nomad-lab.eu
Apache License 2.0
64 stars 14 forks source link

CuSe sample input crashes current NOMAD 1.2.2 #95

Open behnle opened 4 months ago

behnle commented 4 months ago

When trying to visualize the CuSe FHI-aims GeometryOptimization simulation sample, NOMAD crashes with a python error:

# docker logs -t nomad_oasis_app
...
2024-02-06T17:59:09.434522018Z ERROR    nomad.app            2024-02-06T17:59:09 unexpected exception in API
2024-02-06T17:59:09.434565627Z   - exception: Traceback (most recent call last):
2024-02-06T17:59:09.434568472Z       File "/usr/local/lib/python3.9/site-packages/anyio/streams/memory.py", line 94, in receive
2024-02-06T17:59:09.434570467Z         return self.receive_nowait()
2024-02-06T17:59:09.434572529Z       File "/usr/local/lib/python3.9/site-packages/anyio/streams/memory.py", line 89, in receive_nowait
2024-02-06T17:59:09.434574323Z         raise WouldBlock
2024-02-06T17:59:09.434575906Z     anyio.WouldBlock
2024-02-06T17:59:09.434590590Z     
2024-02-06T17:59:09.434592369Z     During handling of the above exception, another exception occurred:
2024-02-06T17:59:09.434594359Z     
2024-02-06T17:59:09.434595879Z     Traceback (most recent call last):
2024-02-06T17:59:09.434597486Z       File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 78, in call_next
2024-02-06T17:59:09.434599132Z         message = await recv_stream.receive()
2024-02-06T17:59:09.434600597Z       File "/usr/local/lib/python3.9/site-packages/anyio/streams/memory.py", line 114, in receive
2024-02-06T17:59:09.434602342Z         raise EndOfStream
2024-02-06T17:59:09.434603864Z     anyio.EndOfStream
2024-02-06T17:59:09.434605343Z     
2024-02-06T17:59:09.434606836Z     During handling of the above exception, another exception occurred:
2024-02-06T17:59:09.434608500Z     
2024-02-06T17:59:09.434609940Z     Traceback (most recent call last):
2024-02-06T17:59:09.434611397Z       File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
2024-02-06T17:59:09.434613029Z         await self.app(scope, receive, _send)
2024-02-06T17:59:09.434614573Z       File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 108, in __call__
2024-02-06T17:59:09.434616235Z         response = await self.dispatch_func(request, call_next)
2024-02-06T17:59:09.434617786Z       File "/usr/local/lib/python3.9/site-packages/nomad/app/v1/main.py", line 95, in log_request_time
2024-02-06T17:59:09.434619466Z         return await call_next(request)
2024-02-06T17:59:09.434621016Z       File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 84, in call_next
2024-02-06T17:59:09.434622639Z         raise app_exc
2024-02-06T17:59:09.434624079Z       File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 70, in coro
2024-02-06T17:59:09.434625670Z         await self.app(scope, receive_or_disconnect, send_no_error)
2024-02-06T17:59:09.434627249Z       File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 92, in __call__
2024-02-06T17:59:09.434628906Z         await self.simple_response(scope, receive, send, request_headers=headers)
2024-02-06T17:59:09.434630502Z       File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 147, in simple_response
2024-02-06T17:59:09.434632206Z         await self.app(scope, receive, send)
2024-02-06T17:59:09.434634386Z       File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
2024-02-06T17:59:09.434636094Z         raise exc
2024-02-06T17:59:09.434637518Z       File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
2024-02-06T17:59:09.434639150Z         await self.app(scope, receive, sender)
2024-02-06T17:59:09.434643144Z       File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
2024-02-06T17:59:09.434644892Z         raise e
2024-02-06T17:59:09.434646395Z       File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
2024-02-06T17:59:09.434648095Z         await self.app(scope, receive, send)
2024-02-06T17:59:09.434649736Z       File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__
2024-02-06T17:59:09.434651382Z         await route.handle(scope, receive, send)
2024-02-06T17:59:09.434652878Z       File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
2024-02-06T17:59:09.434654502Z         await self.app(scope, receive, send)
2024-02-06T17:59:09.434656055Z       File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
2024-02-06T17:59:09.434657704Z         response = await func(request)
2024-02-06T17:59:09.434659254Z       File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
2024-02-06T17:59:09.434660920Z         raw_response = await run_endpoint_function(
2024-02-06T17:59:09.434662521Z       File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
2024-02-06T17:59:09.434665672Z         return await dependant.call(**values)
2024-02-06T17:59:09.434667211Z       File "/usr/local/lib/python3.9/site-packages/nomad/app/v1/routers/entries.py", line 1474, in post_entry_archive_query
2024-02-06T17:59:09.434668958Z         return answer_entry_archive_request(
2024-02-06T17:59:09.434670483Z       File "/usr/local/lib/python3.9/site-packages/nomad/app/v1/routers/entries.py", line 1369, in answer_entry_archive_request
2024-02-06T17:59:09.434672301Z         archive_data = _read_archive(entry_metadata, uploads, required_reader)[
2024-02-06T17:59:09.434673986Z       File "/usr/local/lib/python3.9/site-packages/nomad/app/v1/routers/entries.py", line 841, in _read_archive
2024-02-06T17:59:09.434675711Z         'archive': required_reader.read(archive, entry_id, upload_id),
2024-02-06T17:59:09.434677338Z       File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 274, in read
2024-02-06T17:59:09.434678991Z         result = self._apply_required(self.required, archive_root, dataset)
2024-02-06T17:59:09.434680888Z       File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 617, in _apply_required
2024-02-06T17:59:09.434682559Z         result[prop] = self._apply_required(
2024-02-06T17:59:09.434684120Z       File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 617, in _apply_required
2024-02-06T17:59:09.434685810Z         result[prop] = self._apply_required(
2024-02-06T17:59:09.434687409Z       File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 617, in _apply_required
2024-02-06T17:59:09.434689100Z         result[prop] = self._apply_required(
2024-02-06T17:59:09.434692623Z       File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 610, in _apply_required
2024-02-06T17:59:09.434694337Z         result[prop] = [
2024-02-06T17:59:09.434695807Z       File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 611, in <listcomp>
2024-02-06T17:59:09.434697805Z         self._apply_required(
2024-02-06T17:59:09.434699368Z       File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 571, in _apply_required
2024-02-06T17:59:09.434701072Z         return self._resolve_refs(dataset.definition, archive_item, dataset)
2024-02-06T17:59:09.434702686Z       File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 291, in _resolve_refs
2024-02-06T17:59:09.434704382Z         return to_json(archive[definition.name])
2024-02-06T17:59:09.434705969Z     TypeError: 'float' object is not subscriptable
2024-02-06T17:59:09.434707507Z   - exception_hash: LgwPuknqAyfmv-1-IfcRN0ICTJqZ
2024-02-06T17:59:09.434708993Z   - nomad.app.url: URL('http://app:8000/nomad-oasis/api/v1/entries/mNlbwdx2HB9I6FcjhBsx_ChojPA8/archive/query')
2024-02-06T17:59:09.434710615Z   - nomad.commit: 
2024-02-06T17:59:09.434712118Z   - nomad.deployment: oasis
2024-02-06T17:59:09.434713664Z   - nomad.service: app
2024-02-06T17:59:09.434715207Z   - nomad.version: 1.2.2.dev295+g2e611aff1
2024-02-06T17:59:09.436173594Z [2024-02-06 18:59:09 +0100] [18] [ERROR] Exception in ASGI application
2024-02-06T17:59:09.436184279Z Traceback (most recent call last):
2024-02-06T17:59:09.436186061Z   File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
2024-02-06T17:59:09.436188155Z     result = await app(  # type: ignore[func-returns-value]
2024-02-06T17:59:09.436189706Z   File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
2024-02-06T17:59:09.436191378Z     return await self.app(scope, receive, send)
2024-02-06T17:59:09.436192984Z   File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 271, in __call__
2024-02-06T17:59:09.436194763Z     await super().__call__(scope, receive, send)
2024-02-06T17:59:09.436196429Z   File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 118, in __call__
2024-02-06T17:59:09.436198106Z     await self.middleware_stack(scope, receive, send)
2024-02-06T17:59:09.436199595Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
2024-02-06T17:59:09.436201290Z     raise exc
2024-02-06T17:59:09.436202750Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
2024-02-06T17:59:09.436204441Z     await self.app(scope, receive, _send)
2024-02-06T17:59:09.436206017Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 109, in __call__
2024-02-06T17:59:09.436212341Z     await response(scope, receive, send)
2024-02-06T17:59:09.436213991Z   File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 277, in __call__
2024-02-06T17:59:09.436228014Z     await wrap(partial(self.listen_for_disconnect, receive))
2024-02-06T17:59:09.436229602Z   File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
2024-02-06T17:59:09.436231291Z     raise exceptions[0]
2024-02-06T17:59:09.436232832Z   File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 273, in wrap
2024-02-06T17:59:09.436234540Z     await func()
2024-02-06T17:59:09.436236049Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 134, in stream_response
2024-02-06T17:59:09.436237723Z     return await super().stream_response(send)
2024-02-06T17:59:09.436239224Z   File "/usr/local/lib/python3.9/site-packages/starlette/responses.py", line 262, in stream_response
2024-02-06T17:59:09.436240823Z     async for chunk in self.body_iterator:
2024-02-06T17:59:09.436242312Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 98, in body_stream
2024-02-06T17:59:09.436243986Z     raise app_exc
2024-02-06T17:59:09.436245569Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 70, in coro
2024-02-06T17:59:09.436247628Z     await self.app(scope, receive_or_disconnect, send_no_error)
2024-02-06T17:59:09.436249277Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
2024-02-06T17:59:09.436251076Z     raise exc
2024-02-06T17:59:09.436252897Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
2024-02-06T17:59:09.436254533Z     await self.app(scope, receive, sender)
2024-02-06T17:59:09.436256008Z   File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
2024-02-06T17:59:09.436257706Z     raise e
2024-02-06T17:59:09.436259218Z   File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
2024-02-06T17:59:09.436260942Z     await self.app(scope, receive, send)
2024-02-06T17:59:09.436262505Z   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__
2024-02-06T17:59:09.436264204Z     await route.handle(scope, receive, send)
2024-02-06T17:59:09.436265710Z   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 443, in handle
2024-02-06T17:59:09.436267302Z     await self.app(scope, receive, send)
2024-02-06T17:59:09.436268768Z   File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 271, in __call__
2024-02-06T17:59:09.436270431Z     await super().__call__(scope, receive, send)
2024-02-06T17:59:09.436274544Z   File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 118, in __call__
2024-02-06T17:59:09.436276341Z     await self.middleware_stack(scope, receive, send)
2024-02-06T17:59:09.436277951Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
2024-02-06T17:59:09.436279705Z     raise exc
2024-02-06T17:59:09.436281167Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
2024-02-06T17:59:09.436282796Z     await self.app(scope, receive, _send)
2024-02-06T17:59:09.436284285Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 108, in __call__
2024-02-06T17:59:09.436285970Z     response = await self.dispatch_func(request, call_next)
2024-02-06T17:59:09.436287604Z   File "/usr/local/lib/python3.9/site-packages/nomad/app/v1/main.py", line 95, in log_request_time
2024-02-06T17:59:09.436289303Z     return await call_next(request)
2024-02-06T17:59:09.436290897Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 84, in call_next
2024-02-06T17:59:09.436292601Z     raise app_exc
2024-02-06T17:59:09.436294072Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/base.py", line 70, in coro
2024-02-06T17:59:09.436295678Z     await self.app(scope, receive_or_disconnect, send_no_error)
2024-02-06T17:59:09.436297398Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 92, in __call__
2024-02-06T17:59:09.436299108Z     await self.simple_response(scope, receive, send, request_headers=headers)
2024-02-06T17:59:09.436300790Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/cors.py", line 147, in simple_response
2024-02-06T17:59:09.436302557Z     await self.app(scope, receive, send)
2024-02-06T17:59:09.436304119Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
2024-02-06T17:59:09.436305858Z     raise exc
2024-02-06T17:59:09.436307292Z   File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
2024-02-06T17:59:09.436308915Z     await self.app(scope, receive, sender)
2024-02-06T17:59:09.436310401Z   File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
2024-02-06T17:59:09.436312059Z     raise e
2024-02-06T17:59:09.436313586Z   File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
2024-02-06T17:59:09.436315287Z     await self.app(scope, receive, send)
2024-02-06T17:59:09.436316884Z   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 706, in __call__
2024-02-06T17:59:09.436318580Z     await route.handle(scope, receive, send)
2024-02-06T17:59:09.436320114Z   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
2024-02-06T17:59:09.436324081Z     await self.app(scope, receive, send)
2024-02-06T17:59:09.436325637Z   File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
2024-02-06T17:59:09.436327269Z     response = await func(request)
2024-02-06T17:59:09.436328865Z   File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
2024-02-06T17:59:09.436330577Z     raw_response = await run_endpoint_function(
2024-02-06T17:59:09.436332193Z   File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
2024-02-06T17:59:09.436333935Z     return await dependant.call(**values)
2024-02-06T17:59:09.436335535Z   File "/usr/local/lib/python3.9/site-packages/nomad/app/v1/routers/entries.py", line 1474, in post_entry_archive_query
2024-02-06T17:59:09.436337221Z     return answer_entry_archive_request(
2024-02-06T17:59:09.436338705Z   File "/usr/local/lib/python3.9/site-packages/nomad/app/v1/routers/entries.py", line 1369, in answer_entry_archive_request
2024-02-06T17:59:09.436340423Z     archive_data = _read_archive(entry_metadata, uploads, required_reader)[
2024-02-06T17:59:09.436342091Z   File "/usr/local/lib/python3.9/site-packages/nomad/app/v1/routers/entries.py", line 841, in _read_archive
2024-02-06T17:59:09.436343855Z     'archive': required_reader.read(archive, entry_id, upload_id),
2024-02-06T17:59:09.436345699Z   File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 274, in read
2024-02-06T17:59:09.436347456Z     result = self._apply_required(self.required, archive_root, dataset)
2024-02-06T17:59:09.436349045Z   File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 617, in _apply_required
2024-02-06T17:59:09.436350653Z     result[prop] = self._apply_required(
2024-02-06T17:59:09.436352153Z   File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 617, in _apply_required
2024-02-06T17:59:09.436353856Z     result[prop] = self._apply_required(
2024-02-06T17:59:09.436355439Z   File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 617, in _apply_required
2024-02-06T17:59:09.436357149Z     result[prop] = self._apply_required(
2024-02-06T17:59:09.436358740Z   File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 610, in _apply_required
2024-02-06T17:59:09.436360449Z     result[prop] = [
2024-02-06T17:59:09.436362025Z   File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 611, in <listcomp>
2024-02-06T17:59:09.436363949Z     self._apply_required(
2024-02-06T17:59:09.436365407Z   File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 571, in _apply_required
2024-02-06T17:59:09.436367077Z     return self._resolve_refs(dataset.definition, archive_item, dataset)
2024-02-06T17:59:09.436368724Z   File "/usr/local/lib/python3.9/site-packages/nomad/archive/required.py", line 291, in _resolve_refs
2024-02-06T17:59:09.436372272Z     return to_json(archive[definition.name])
2024-02-06T17:59:09.436373949Z TypeError: 'float' object is not subscriptable

In the GUI, this triggers an internal server error: Unexpected error: "[object Object] (500)". Please try again and let us know, if this error keeps happening. No cell or workflow graph is shown. NOMAD version is 1.2.2.dev295+g2e611aff1.

behnle commented 4 months ago

Totally forgot to mention: Docker is docker 25.0.3 on Rocky 9.3:

[root@host nomad]# docker info
Client: Docker Engine - Community
 Version:    25.0.3
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.24.5
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 12
  Running: 7
  Paused: 0
  Stopped: 5
 Images: 14
 Server Version: 25.0.3
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.14.0-362.18.1.el9_3.x86_64
 Operating System: Rocky Linux 9.3 (Blue Onyx)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.56GiB
 Name: u-030-s007
 ID: 17341a6c-20f5-4a76-a0fd-8cf7ecddaf09
 Docker Root Dir: /dockerdata/volumes
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
Pepe-Marquez commented 4 months ago

Hi @behnle, thanks for reporting. Can you confirm you are talking about the Electronic structure code input and output files example upload? I have just run this upload in our beta deployment , and the entry with the mainfile Cu2Se/2/aims.out was taking significantly longer than the others to parse. Eventually, I returned to that upload and the parsing turned out successful and I could visualize the overview cards. @ladinesa, any idea what could be going on here?

ladinesa commented 4 months ago

I will take a closer look but I suspect the parsing timed out. However, the error seems to be inconsistent with a timed-out entry. It could also be that the archive size is larger than permitted causing trouble with the archive reader.

behnle commented 4 months ago

Exactly, that's the one. Let me know if you need additional informations for tracking down the problem.

behnle commented 4 months ago

I will take a closer look but I suspect the parsing timed out. However, the error seems to be inconsistent with a timed-out entry. It could also be that the archive size is larger than permitted causing trouble with the archive reader. Processing seemed to work fine, there were no processing log errors. The problem occurred when i went to the overview page of the sample, and it happens immediately.

There are only two parser warnings

"root":{
"event":string"Energy not reported for an calculation that is part of a geometry optimization"
"proc":string"Entry"
"process":string"process_entry"
"process_worker_id":string"vO0M_ZzSS-KIdrnfYzbUdw"
"parser":string"parsers/fhi-aims"
"normalizer":string"SimulationWorkflowNormalizer"
"step":string"SimulationWorkflowNormalizer"
"logger":string"nomad.processing"
"timestamp":string"2024-02-07 09:18.51"
"level":string"WARNING"
}

but no errors. In case the issue is related to a timeout, which one would it be and where can i adjust it?

ladinesa commented 4 months ago

You can modify the settings by specifying them in the nomad.yaml file. You can have a look at the docs here. For a complete list of config keys, I suggest you look at the code under nomad/config/models.py . For example you can adjust services.api_timeout or celery.timeout

behnle commented 4 months ago

I had already skimmed the list of config options and had set services:api_timeout to 6000 seconds:

services:
  #  api_host: 'localhost'
  api_host: <redacted>
  api_port: 443
  api_base_path: '/nomad-oasis'
  api_timeout: 6000
  https: True
  https_upload: True
  admin_user_id: <redacted> # TODO replace  
    #  aitoolkit_enabled: True
  console_log_level: 10
  upload_limit: 100000

Did not change any celery settings, though.

ladinesa commented 4 months ago

can you please send me the image path. i have troubles finding it.

lauri-codes commented 4 months ago

Might be related to this: https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1861

ladinesa commented 4 months ago

Might be related to this: https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/issues/1861

ah yes I have completely forgotten about this. thanks Lauri

behnle commented 4 months ago

@ladinesa What do You mean by "image path"? The docker image?

[root@host nomad]# docker image ls
REPOSITORY                                                                      TAG       IMAGE ID       CREATED         SIZE
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair                               latest    dae1849135eb   6 weeks ago     1.81GB
...

@lauri-codes Yes, might be related to my issue. Would it help to pull a new docker image if available?

lauri-codes commented 3 months ago

@behnle: Sorry I missed your comment. You can try updating the nomad-fair:latest docker image. If the problem still persists, we need to release a new image with the fix.

behnle commented 3 months ago

@lauri-codes Thanks for the heads-up. I recently pulled the "latest" image:

[root@host nomad]# docker image ls
REPOSITORY                                                                      TAG       IMAGE ID       CREATED         SIZE
nginx                                                                           <none>    e4720093a3c1   6 weeks ago     187MB
nginx                                                                           latest    92b11f67642b   6 weeks ago     187MB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair                               latest    279c097945fe   7 weeks ago     1.88GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair                               <none>    dae1849135eb   3 months ago    1.81GB
python                                                                          latest    e7177b0afd0e   3 months ago    1.02GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/jupyterlab        latest    f1b5e187ee1e   4 months ago    6.39GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/jupyterlab        prod      f1b5e187ee1e   4 months ago    6.39GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/nexus-webtop      latest    548857bf45d9   4 months ago    7.43GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/apmtools-webtop   latest    125e01c59a73   5 months ago    5.29GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/webtop            latest    603c690b7911   5 months ago    1.65GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/ellips-jupyter    latest    4e3e12da664c   5 months ago    6.22GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/xps-jupyter       latest    5bf19c880ab6   5 months ago    5.65GB
nginx                                                                           <none>    a8758716bb6a   5 months ago    187MB
jupyter/datascience-notebook                                                    latest    f78a42f3bc9a   5 months ago    5.92GB
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair                               v1.2.1    cc8dd7c53b3c   6 months ago    1.67GB
rabbitmq                                                                        3.11.5    3ddcc140fe5c   15 months ago   228MB
mongo                                                                           5.0.6     532c84506200   24 months ago   699MB
docker.elastic.co/elasticsearch/elasticsearch                                   7.17.1    515ab4fba870   2 years ago     618MB

With this release, every other attempt on the original sample data succeeds, but some reprocessing runs fail with

"errors":string"process failed due to worker lost: Worker exited prematurely: signal 7 (SIGBUS) Job: 19."
"event":string"process failed"
"proc":string"Entry"
"process":string"process_entry"
"process_worker_id":string"N92Rn87uS9usK6o6O4e9eA"
"parser":string"parsers/exciting"
"logger":string"nomad.processing"
"exception":string"Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost raise WorkerLostError( billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 7 (SIGBUS) Job: 19."
"timestamp":string"2024-03-28 14:12.39"
"level":string"ERROR"
}

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 7 (SIGBUS) Job: 19.

and the docker-compose log contains error messages like this one

nomad_oasis_worker    | 2024-03-28T13:12:39.988471515Z ERROR    nomad.processing     2024-03-28T13:12:39 detected WorkerLostError
nomad_oasis_worker    | 2024-03-28T13:12:39.988485299Z   - exception: Traceback (most recent call last):
nomad_oasis_worker    | 2024-03-28T13:12:39.988487333Z       File "/usr/local/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
nomad_oasis_worker    | 2024-03-28T13:12:39.988489440Z         raise WorkerLostError(
nomad_oasis_worker    | 2024-03-28T13:12:39.988491171Z     billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 7 (SIGBUS) Job: 19.
nomad_oasis_worker    | 2024-03-28T13:12:39.988492930Z   - exception_hash: PyYAluNDeMcTSlMqcpc84b96E4SI
nomad_oasis_worker    | 2024-03-28T13:12:39.988494543Z   - nomad.commit: 
nomad_oasis_worker    | 2024-03-28T13:12:39.988505434Z   - nomad.deployment: oasis
nomad_oasis_worker    | 2024-03-28T13:12:39.988511090Z   - nomad.service: unknown nomad service
nomad_oasis_worker    | 2024-03-28T13:12:39.988513031Z   - nomad.version: 1.2.2.dev357+g15b7cd2e1
nomad_oasis_worker    | 2024-03-28T13:12:39.993245423Z ERROR    nomad.processing     2024-03-28T13:12:39 process failed
nomad_oasis_worker    | 2024-03-28T13:12:39.993288548Z   - exception: Traceback (most recent call last):
nomad_oasis_worker    | 2024-03-28T13:12:39.993291858Z       File "/usr/local/lib/python3.9/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
nomad_oasis_worker    | 2024-03-28T13:12:39.993294068Z         raise WorkerLostError(
nomad_oasis_worker    | 2024-03-28T13:12:39.993295886Z     billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 7 (SIGBUS) Job: 19.
nomad_oasis_worker    | 2024-03-28T13:12:39.993297906Z   - exception_hash: PyYAluNDeMcTSlMqcpc84b96E4SI
nomad_oasis_worker    | 2024-03-28T13:12:39.993300711Z   - nomad.commit: 
nomad_oasis_worker    | 2024-03-28T13:12:39.993302447Z   - nomad.deployment: oasis
nomad_oasis_worker    | 2024-03-28T13:12:39.993304235Z   - nomad.entry_id: pwcCETIYQy1JlgLP5S_E9GHJT7-z
nomad_oasis_worker    | 2024-03-28T13:12:39.993312033Z   - nomad.mainfile: Sn2Se/1/INFO_GS.OUT
nomad_oasis_worker    | 2024-03-28T13:12:39.993332598Z   - nomad.processing.errors: process failed due to worker lost: Worker exited prematurely: signal 7 (SIGBUS) Job: 19.
nomad_oasis_worker    | 2024-03-28T13:12:39.993344413Z   - nomad.processing.logger: nomad.processing
nomad_oasis_worker    | 2024-03-28T13:12:39.993347787Z   - nomad.processing.parser: parsers/exciting
nomad_oasis_worker    | 2024-03-28T13:12:39.993351339Z   - nomad.processing.proc: Entry
nomad_oasis_worker    | 2024-03-28T13:12:39.993354840Z   - nomad.processing.process: process_entry
nomad_oasis_worker    | 2024-03-28T13:12:39.993363288Z   - nomad.processing.process_status: RUNNING
nomad_oasis_worker    | 2024-03-28T13:12:39.993367814Z   - nomad.processing.process_worker_id: N92Rn87uS9usK6o6O4e9eA
nomad_oasis_worker    | 2024-03-28T13:12:39.993371495Z   - nomad.service: unknown nomad service
nomad_oasis_worker    | 2024-03-28T13:12:39.993374973Z   - nomad.upload_id: nivYbhQbRRKyDaAQ1Yor3g
nomad_oasis_worker    | 2024-03-28T13:12:39.993380021Z   - nomad.version: 1.2.2.dev357+g15b7cd2e1

In the journal of the server, i found the following potentially related error message:

Mar 28 14:12:38 u-030-s007 systemd-coredump[280150]: [🡕] Process 274915 (python) of user 1000 dumped core.

                                                     Module /usr/local/lib/python3.9/site-packages/quippy_ase.libs/libopenblasp-r0-dcce3d0b.3.20.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/quippy_ase.libs/libopenblasp-r0-dcce3d0b.3.20.so
                                                     Module /usr/local/lib/python3.9/site-packages/quippy/_quippy.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/quippy/_quippy.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/MDAnalysis.libs/libgomp-a34b3233.so.1.0.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/MDAnalysis.libs/libgomp-a34b3233.so.1.0.0
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/stats/mvn.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/stats/mvn.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/stats/statlib.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/stats/statlib.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/special/cython_special.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/special/cython_special.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/interpolate/dfitpack.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/interpolate/dfitpack.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/interpolate/_fitpack.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/interpolate/_fitpack.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/integrate/lsoda.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/integrate/lsoda.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/integrate/_dop.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/integrate/_dop.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/integrate/vode.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/integrate/vode.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/integrate/_quadpack.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/integrate/_quadpack.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/integrate/_odepack.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/integrate/_odepack.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/_interpolative.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/_interpolative.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/__nnls.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/__nnls.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/_minpack.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/_minpack.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/_slsqp.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/_slsqp.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/_cobyla.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/_cobyla.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/_lbfgsb.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/_lbfgsb.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/_trlib/_trlib.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/_trlib/_trlib.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/sparse/linalg/eigen/arpack/_arpack.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/sparse/linalg/dsolve/_superlu.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/sparse/linalg/dsolve/_superlu.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/sparse/linalg/isolve/_iterative.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/sparse/linalg/isolve/_iterative.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/minpack2.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/optimize/minpack2.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/spatial/qhull.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/_selector.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/_selector.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5l.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5l.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5o.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5o.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5pl.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5pl.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5fd.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5fd.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5i.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5i.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5g.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5g.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5f.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5f.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5ds.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5ds.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5d.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5d.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/_proxy.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/_proxy.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5a.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5a.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5z.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5z.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5ac.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5ac.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/utils.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/utils.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5s.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5s.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_hierarchical_fast.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_hierarchical_fast.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_fast_dict.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_fast_dict.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5p.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5p.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_k_means_elkan.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_k_means_elkan.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_k_means_lloyd.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_k_means_lloyd.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_k_means_minibatch.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_k_means_minibatch.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_k_means_common.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_k_means_common.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5t.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5t.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/tree/_utils.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/tree/_utils.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/tree/_tree.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/tree/_tree.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/tree/_splitter.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/tree/_splitter.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/neighbors/_quad_tree.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/neighbors/_quad_tree.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5r.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5r.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/tree/_criterion.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/tree/_criterion.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/_isotonic.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/_isotonic.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/svm/_libsvm_sparse.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/svm/_libsvm_sparse.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/svm/_libsvm.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/svm/_libsvm.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/_conv.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/_conv.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/linear_model/_sgd_fast.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/linear_model/_sgd_fast.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/linear_model/_cd_fast.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/linear_model/_cd_fast.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_cython_blas.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_cython_blas.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/arrayfuncs.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/arrayfuncs.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/_objects.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/_objects.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/neighbors/_kd_tree.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/neighbors/_kd_tree.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/neighbors/_ball_tree.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/neighbors/_ball_tree.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/metrics/_pairwise_fast.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/metrics/_pairwise_fast.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/defs.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/defs.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/metrics/_dist_metrics.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/metrics/_dist_metrics.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/preprocessing/_csr_polynomial_expansion.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/preprocessing/_csr_polynomial_expansion.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/sparsefuncs_fast.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/sparsefuncs_fast.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/h5.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py.libs/libaec-9c9e97eb.so.0.0.10 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py.libs/libaec-9c9e97eb.so.0.0.10
                                                     Module /usr/local/lib/python3.9/site-packages/h5py.libs/libsz-090daab4.so.2.0.1 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py.libs/libsz-090daab4.so.2.0.1
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_readonly_array_wrapper.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_readonly_array_wrapper.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/h5py.libs/libhdf5_hl-84bfe2a0.so.200.0.1 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py.libs/libhdf5_hl-84bfe2a0.so.200.0.1
                                                     Module /usr/local/lib/python3.9/site-packages/h5py.libs/libhdf5-346dbfc8.so.200.1.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py.libs/libhdf5-346dbfc8.so.200.1.0
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/_errors.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/h5py/_errors.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/manifold/_barnes_hut_tsne.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/manifold/_barnes_hut_tsne.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libz-a147dcb0.so.1.2.3 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libz-a147dcb0.so.1.2.3
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/svm/_liblinear.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/svm/_liblinear.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libcurl-33f5ac06.so.4.6.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libcurl-33f5ac06.so.4.6.0
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libaec-f0d4887b.so.0.0.10 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libaec-f0d4887b.so.0.0.10
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_weight_vector.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_weight_vector.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libhdf5-5d1f23d4.so.103.1.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libhdf5-5d1f23d4.so.103.1.0
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/special/_ellip_harm_2.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/special/_ellip_harm_2.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/linear_model/_sag_fast.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/linear_model/_sag_fast.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/decomposition/_cdnmf_fast.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/decomposition/_cdnmf_fast.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/cython_lapack.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/cython_lapack.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/cython_blas.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/cython_blas.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_logistic_sigmoid.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_logistic_sigmoid.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/_flinalg.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/_flinalg.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_seq_dataset.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_seq_dataset.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/_flapack.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/_flapack.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_dbscan_inner.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/cluster/_dbscan_inner.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/_fblas.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/linalg/_fblas.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_random.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_random.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/special/specfun.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/special/specfun.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/metrics/cluster/_expected_mutual_info_fast.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/metrics/cluster/_expected_mutual_info_fast.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libsz-53d02de5.so.2.0.1 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libsz-53d02de5.so.2.0.1
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/manifold/_utils.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/manifold/_utils.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/murmurhash.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/murmurhash.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libhdf5_hl-14f94ac1.so.100.1.2 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libhdf5_hl-14f94ac1.so.100.1.2
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libnetcdf-2ecdc039.so.15.0.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4.libs/libnetcdf-2ecdc039.so.15.0.0
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4/_netCDF4.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/netCDF4/_netCDF4.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy.libs/libgfortran-ed201abd.so.3.0.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy.libs/libgfortran-ed201abd.so.3.0.0
                                                     Module /usr/local/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-085ca80a.3.9.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-085ca80a.3.9.so
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/special/_ufuncs.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/scipy/special/_ufuncs.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/decomposition/_online_lda_fast.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/decomposition/_online_lda_fast.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/neighbors/_partition_nodes.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/neighbors/_partition_nodes.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_typedefs.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_typedefs.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/MDAnalysis/lib/c_distances_openmp.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/MDAnalysis/lib/c_distances_openmp.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_openmp_helpers.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/utils/_openmp_helpers.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/__check_build/_check_build.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/sklearn/__check_build/_check_build.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/numpy/linalg/lapack_lite.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/numpy.libs/libquadmath-96973f99.so.0.0.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/numpy.libs/libquadmath-96973f99.so.0.0.0
                                                     Module /usr/local/lib/python3.9/site-packages/numpy/linalg/_umath_linalg.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/numpy/linalg/_umath_linalg.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/numpy.libs/libgfortran-040039e1.so.5.0.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/numpy.libs/libgfortran-040039e1.so.5.0.0
                                                     Module /usr/local/lib/python3.9/site-packages/numpy.libs/libopenblas64_p-r0-2f7c42d4.3.18.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/numpy.libs/libopenblas64_p-r0-2f7c42d4.3.18.so
                                                     Module /usr/local/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/libXau-154567c4.so.6.0.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/libXau-154567c4.so.6.0.0
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/liblzma-160b9c62.so.5.4.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/liblzma-160b9c62.so.5.4.0
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/libxcb-3e83370d.so.1.1.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/libxcb-3e83370d.so.1.1.0
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/libtiff-b9364ff1.so.6.0.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/libtiff-b9364ff1.so.6.0.0
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/libopenjp2-78c47f58.so.2.5.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/libopenjp2-78c47f58.so.2.5.0
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/libjpeg-16b2c4cf.so.62.3.0 without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/Pillow.libs/libjpeg-16b2c4cf.so.62.3.0
                                                     Module /usr/local/lib/python3.9/site-packages/PIL/_imaging.cpython-39-x86_64-linux-gnu.so without build-id.
                                                     Module /usr/local/lib/python3.9/site-packages/PIL/_imaging.cpython-39-x86_64-linux-gnu.so
                                                     Stack trace of thread 26:
                                                     #0  0x00007f074c1fc88c n/a (/usr/local/lib/libpython3.9.so.1.0 + 0x1af88c)
                                                     #1  0x00007f06d73c9000 n/a (n/a + 0x0)
                                                     ELF object binary architecture: AMD x86-64
Mar 28 14:12:38 u-030-s007 [280209]: Could not parse number of program headers from core file: invalid `Elf' handle
Mar 28 14:12:38 u-030-s007 [280209]: Could not parse number of program headers from core file: invalid `Elf' handle

(uid 1000 is the nomad user)

I have no clue what is going on. The server has 16 GiB RAM so IMHO an OOM event is rather unlikely (but not impossible).

This affects the sample files Cu2Se/2/aims.out and Sn2Se/1/INFO_GS.OUT. All other files from this sample bundle do work.

Edit: NOMAD version is now 1.2.2.dev357+g15b7cd2e1

lauri-codes commented 2 months ago

@behnle: I will try to reproduce the problem and see why the parser is struggling with this example.

lauri-codes commented 2 months ago

I can confirm that at least one particular main file seems to use a very large amount of RAM, ultimately causing the process to be killed. Here is the zip: int_hse.zip file, the problematic file is output_1.

We need to check what is causing the memory usage to blow up in the FHI-aims parser for this file. In general some calculations are very big and will need a lot of RAM to be processed, but this does not look like one to me. @ndaelman-hu, @JosePizarro3 : Could you investigate this a bit?

ndaelman-hu commented 2 months ago

I can confirm that at least one particular main file seems to use a very large amount of RAM, ultimately causing the process to be killed. Here is the zip: int_hse.zip file, the problematic file is output_1.

We need to check what is causing the memory usage to blow up in the FHI-aims parser for this file. In general some calculations are very big and will need a lot of RAM to be processed, but this does not look like one to me. @ndaelman-hu, @JosePizarro3 : Could you investigate this a bit?

It's likely this basis set tier checker. I'll see to slim it down.

ndaelman-hu commented 2 months ago

The issue is FHIAimsOutParser: trying to get its data causes a memory leak. The native tier files is still big (~1 GB in RAM), but not the culprit.

Am investigating further.