Closed liamhuber closed 10 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Coverage variation | Diff coverage |
---|---|
:white_check_mark: +0.00% (target: -1.00%) | :white_check_mark: ∅ |
You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation
Totals | |
---|---|
Change from base Build 7415131425: | 0.0% |
Covered Lines: | 4543 |
Relevant Lines: | 5033 |
At least one Lammps job in the deepdive is crashing. Here's the tail of the errors:
File /usr/share/miniconda3/envs/my-env/lib/python3.11/site-packages/pyiron_base/jobs/job/runfunction.py:114, in run_job_with_status_created(job)
112 run_job_with_runmode_manually(job=job, _manually_print=True)
113 elif job.server.run_mode.modal:
--> 114 job.run_static()
115 elif job.server.run_mode.srun:
116 run_job_with_runmode_srun(job=job)
File /usr/share/miniconda3/envs/my-env/lib/python3.11/site-packages/pyiron_base/jobs/job/generic.py:765, in GenericJob.run_static(self)
761 def run_static(self):
762 """
763 The run static function is called by run to execute the simulation.
764 """
--> 765 execute_job_with_external_executable(job=self)
File /usr/share/miniconda3/envs/my-env/lib/python3.11/site-packages/pyiron_base/jobs/job/runfunction.py:608, in run_time_decorator.<locals>.wrapper(job)
606 if not state.database.database_is_disabled and job.job_id is not None:
607 job.project.db.item_update({"timestart": datetime.now()}, job.job_id)
--> 608 func(job)
609 job.project.db.item_update(job._runtime(), job.job_id)
610 else:
File /usr/share/miniconda3/envs/my-env/lib/python3.11/site-packages/pyiron_base/jobs/job/runfunction.py:646, in execute_job_with_external_executable(job)
636 out = subprocess.run(
637 executable,
638 cwd=job.project_hdf5.working_directory,
(...)
643 check=True,
644 ).stdout
645 except subprocess.CalledProcessError as e:
--> 646 out, job_crashed = handle_failed_job(job=job, error=e)
648 job._logger.info(
649 "{}, status: {}, output: {}".format(job.job_info_str, job.status, out)
650 )
651 with open(
652 posixpath.join(job.project_hdf5.working_directory, "error.out"), mode="w"
653 ) as f_err:
File /usr/share/miniconda3/envs/my-env/lib/python3.11/site-packages/pyiron_base/jobs/job/runfunction.py:700, in handle_failed_job(job, error)
698 if job.server.run_mode.non_modal:
699 state.database.close_connection()
--> 700 raise RuntimeError("Job aborted")
701 else:
702 return True, out
RuntimeError: Job aborted
Error: Process completed with exit code 1
Same error. It's in this cell, the first one where pyiron_lammps
is used:
wf.register("pyiron_atomistics", "pyiron_workflow.node_library.pyiron_atomistics")
wf.register("plotting", "pyiron_workflow.node_library.plotting")
wf = Workflow("with_prebuilt")
wf.structure = wf.create.pyiron_atomistics.Bulk(cubic=True, name="Al")
wf.engine = wf.create.pyiron_atomistics.Lammps(structure=wf.structure)
wf.calc = wf.create.pyiron_atomistics.CalcMd(job=wf.engine)
wf.plot = wf.create.plotting.Scatter(
x=wf.calc.outputs.steps,
y=wf.calc.outputs.temperature
)
out = wf.run()
out.plot__fig
It runs totally fine on my machine, and a quick human comparison shows no difference in the env (doesn't mean it's not there, just not in the packages I peeked at).
I took a closer look and my graphviz (certainly not an issue since the visualization cells are fine) and numpy (potentially an issue but seems unlikely) packages are out of date on my local env. My more thorough search showed everything else lining up fine -- in particular my local env has the same lammps
and pyiron-data
, along with pyiron_base
, pyiron_atomistics
, the h5*
packages, and everything else I compared. Very strange.
Nope, updated those two and it's still fine locally.
The head of the error stack is in pyiron_base
:
---------------------------------------------------------------------------
Exception encountered at "In [42]":
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
File /usr/share/miniconda3/envs/my-env/lib/python3.11/site-packages/pyiron_base/jobs/job/runfunction.py:636, in execute_job_with_external_executable(job)
635 try:
--> 636 out = subprocess.run(
637 executable,
638 cwd=job.project_hdf5.working_directory,
639 shell=shell,
640 stdout=subprocess.PIPE,
641 stderr=subprocess.STDOUT,
642 universal_newlines=True,
643 check=True,
644 ).stdout
645 except subprocess.CalledProcessError as e:
So it really looks like a lammps executable crash but why?
CalledProcessError: Command '/usr/share/miniconda3/envs/my-env/share/pyiron/lammps/bin/run_lammps_2020.03.03.sh' returned non-zero exit status 127.
127 is that the command could not be found. Indeed, in my local env's directory .../share/pyiron/lammps/bin/run_lammps_2020.03.03.sh
is there. So what part of the remote env is failing to set this up?
Coverage variation | Diff coverage |
---|---|
:white_check_mark: +0.00% (target: -1.00%) | :white_check_mark: ∅ |
You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation
Even pyiron-data 0.0.24, which was the first one I used has run_lammps_2020.03.03.sh, so I'm super confused why the executable is being not found
I wondered if somehow the resource path is broken, but I modified the notebook to show me the resource paths before crashing and they look totally fine:
Cell In[42], line 19
17 except:
18 from pyiron_base import state
---> 19 raise RuntimeError(f"State configuration:{state.settings.configuration}")
RuntimeError: State configuration:{'user': 'pyiron', 'resource_paths': ['/home/runner/work/pyiron_workflow/pyiron_workflow/tests/static', '/usr/share/miniconda3/envs/my-env/share/pyiron', '/usr/share/miniconda3/envs/my-env/share/iprpy'], 'project_paths': ['/home/runner/work/pyiron_workflow/pyiron_workflow/'], 'connection_timeout': 60, 'sql_connection_string': None, 'sql_table_name': 'jobs_pyiron', 'sql_view_connection_string': None, 'sql_view_table_name': None, 'sql_view_user': None, 'sql_view_user_key': None, 'sql_file': '/home/runner/pyiron.db', 'sql_host': None, 'sql_type': 'SQLite', 'sql_user_key': None, 'sql_database': None, 'project_check_enabled': False, 'disable_database': False, 'credentials_file': None, 'write_work_dir_warnings': True, 'config_file_permissions_warning': True}
I couldn't even get things working over in #150 where I specified old versions of stuff.
I updated everything in my local env and find I can still run the deepdive notebook without trouble.
Forget spot-checking suspect packages, I copied the entire environment being installed here, and the entire environment on my local machine, and used some python to compare all the packages that are installed in both locations but don't have the same version. It is few:
PACKAGE LOCAL REMOTE
widgetsnbextension 3.6.1 4.0.9
jupyterlab 3.5.3 4.0.10
graphviz 9.0.0 8.1.0
ipywidgets 7.7.1 8.1.1
pyiron-data 0.0.27 0.0.26
notebook 6.5.4 7.0.6
jupyterlab_widgets 1.1.2 3.0.9
For completeness I'm going to bump pyiron-data
one more time here, but I'm not optimistic it will actually fix things, since I also had no trouble back when I was on 0.0.26 locally.
Well, that is good but extremely frustrating. I have no idea why pyiron-data works with both 0.0.26 and 0.0.27 locally, but requires 0.0.27 on the CI.
And updating the atomistics notebook to use the new API.
Basically a manual version of what should be automated, as discussed in #148