pyiron / pyiron_base

Core components of the pyiron integrated development environment (IDE) for computational materials science
https://pyiron-base.readthedocs.io
BSD 3-Clause "New" or "Revised" License
19 stars 13 forks source link

Inconsistent Behavior if Space in Job Name #1040

Open pstaerk opened 1 year ago

pstaerk commented 1 year ago

While using pyiron to track LAMMPS jobs, I had by accident submitted some jobs with a space character in their name, e.g. job_1 window1. Without throwing an error, pyiron accepted this, saving the jobs with the space character in the filename for the h5 database file, but internally (as seen in project.job_table()) it replaced the space with a underscore character.

This resulted in h5 database is empty errors when attempting to load the job again, as the load code searched under the name with the underscore, while all files had the space character in them. Maybe it would be a good idea to warn the user if there is a space character in the job name? Or maybe consistently convert the space into a underscore?

jan-janssen commented 1 year ago

Hi @pstaerk, thank you for reporting this issue. I hope it is already fixed with https://github.com/pyiron/pyiron_base/pull/1030 which should be included in the next pyiron version. Still to be sure, can you share a quick example code to reproduce your error?

jan-janssen commented 1 year ago

@samwaseda As we now had multiple people who were surprised about pyiron changing the job name silently, I am wondering if the old way which just raised an error for incompatible job names was more user friendly. Or alternatively we should at least have a warning when the job name is changed to make it less confusing.

jan-janssen commented 1 year ago

I added a warning in https://github.com/pyiron/pyiron_base/pull/1041

pstaerk commented 1 year ago

Ok, thanks! Admittedly, I did not check with the current develop version, rather with the default that gets installed when you use the conda installation procedure shown in the docs.

From what I understand the simplest way to re-create this error is to submit a dummy job of any kind, while giving it a name such as asdf_1 asdf.

The error then occurs if one wants to re-load the job using something like pr.load('asdf_1 asdf'), as pyiron looks for the asdf_1_asdf.h5, while only asdf_1 asdf.h5 exists.

I think I would personally prefer that this throws an error, or alternatively that the load function also transforms all spaces into dashes, which also would make sense to me.

samwaseda commented 1 year ago

I don't think the problem is the fact that pyiron internally changes the name, but the real problem is that the job cannot be correctly loaded, meaning it has to correctly raise an error when there's a space in the job name (and not when the job name is changed)

jan-janssen commented 1 year ago

@samwaseda With the latest development version the following code works:

from pyiron_atomistics import Project
pr = Project("demonstration")
job = pr.create.job.Lammps(job_name="calculation 1")
job.structure = pr.create.structure.ase.bulk("Cu")
job.run()
job = pr.load("calculation 1")
print(job.project_hdf5.file_name)
> '/Users/janssen/pyiron/projects/2023/2023-02-21-jobname-debug/demonstration/calculation_1.h5'
jan-janssen commented 1 year ago

@pstaerk For me this also works with pyiron_base 0.5.31 which is the latest version of pyiron_base on conda which is compatible to pyiron_atomistics. Which version do you use?

samwaseda commented 1 year ago

@samwaseda With the latest development version the following code works:

Then it's even better 😎

samwaseda commented 1 year ago

I said it also in the PR, but in my opinion what's important is that the job name definition and loading can be done consistently, and not so much whether the name is changed or not.

pstaerk commented 1 year ago

I just checked pyiron_base.__version__ which gives me 0.5.32.

I don't really have a good MWE, because I used a modified Lammps executable. But if it seems to work with your newest version then everything seems to be alright?

jan-janssen commented 1 year ago

I don't really have a good MWE, because I used a modified Lammps executable. But if it seems to work with your newest version then everything seems to be alright?

I am a bit surprised - as it worked already with 0.5.31 it might be that I am looking at a different error. If you could share your example that failed, that would be very helpful.

pstaerk commented 1 year ago

My current code is about 500 lines long or so, but I can try to copy/paste the relevant sections here:

job_name = 'wl_window_type 10_other_nr1_f_9'
...
project.create_job("LammpsWL", job_name)
...
job.run()

And then, when I attempted to load:

pr = Project("...")

...
        jobname = f"wl_window_type 10_other_nr1_f_9"
        job = pr[jobname]
        qs_dict[i].append(job['output/gcmc/Q'])

I got the aforementioned error.

Sorry if I can't be more helpful. Sharing the entire huge block of code + the modified executable seems unfeasible.

jan-janssen commented 1 year ago

@pstaerk The difference is job = pr["calculation 1"] does not work while job = pr.load("calculation 1") does. @samwaseda That is the reason why I would like to have the warning to at least inform the users that something changed.

pmrv commented 1 year ago

@pstaerk The difference is job = pr["calculation 1"] does not work while job = pr.load("calculation 1") does. @samwaseda That is the reason why I would like to have the warning to at least inform the users that something changed.

Those two option really ought to give the same result though.

I also support warnings. They are easy to filter in high throughput settings, so that doesn't hurt imo.

samwaseda commented 1 year ago

@pstaerk The difference is job = pr["calculation 1"] does not work while job = pr.load("calculation 1") does. @samwaseda That is the reason why I would like to have the warning to at least inform the users that something changed.

Ah no, in this case __getitem__ must do the conversion properly. Issuing a warning is unrelated here.

samwaseda commented 1 year ago

I still oppose warnings. They should be issued only when the user needs to know something, but in the problems raised so far, the confusion comes from the fact that pyiron didn't do the conversion consistently. I would support state.logger.info or state.logger.debug at most.

pmrv commented 1 year ago

[INFO] is ok with me.

jan-janssen commented 1 year ago

Ah no, in this case __getitem__ must do the conversion properly. Issuing a warning is unrelated here.

The issue is now fixed in https://github.com/pyiron/pyiron_base/pull/1042

jan-janssen commented 1 year ago

I still oppose warnings. They should be issued only when the user needs to know something, but in the problems raised so far, the confusion comes from the fact that pyiron didn't do the conversion consistently. I would support state.logger.info or state.logger.debug at most.

As it is an impact change and I do not know if it works with the GenericMaster and so on I would prefer to have a warning to make it visual that pyiron is implicitly changing the users input. It is now the second time that we have an issue with this change and it is very hard for new pyiron users to identify what is causing the error. This would not be the case with the warning message.

liamhuber commented 1 year ago

As it is an impact change and I do not know if it works with the GenericMaster and so on I would prefer to have a warning to make it visual that pyiron is implicitly changing the users input.

Agreed. Explicit is better than implicit. Here the implicit is handy, but we should at least provide the user a log of what shenanigans we're getting up to.

samwaseda commented 1 year ago

As it is an impact change and I do not know if it works with the GenericMaster and so on I would prefer to have a warning to make it visual that pyiron is implicitly changing the users input. It is now the second time that we have an issue with this change and it is very hard for new pyiron users to identify what is causing the error. This would not be the case with the warning message.

That means we'll issue something like "Warning: We changed your job name and you should be worried because we have no idea what that means", correct? That makes it look extremely unprofessional. In addition, we offer the conversion as a feature, so that people can directly insert e.g. a numpy array without worrying about converting it to a string. Warnings should be issued when the user is doing something in principle wrong, not for doing something we encourage.

pmrv commented 1 year ago

So the issue I see is that someone comes up with two different job names that map to the same "corrected" job name in our current scheme. They are not aware of the conversion because the API hides it. When they submit the second job, in the best case it'll be of the same job type and just appear to reload the first job (which ought to confuse them a lot), in the worst case the job type is different and they are greeted with mysterious stack trace deep within pyiron.

So that's the failure case I'd like to exclude. If a warning on every job conversion is too much, then maybe we can have a check whether a job with the same "corrected" name already exists? I personally don't like that second option because it would incur a filesystem or database lookup on every job creation and that would hurt in high throughput settings, but I'm willing to compromise on this.

jan-janssen commented 1 year ago

So that's the failure case I'd like to exclude. If a warning on every job conversion is too much, then maybe we can have a check whether a job with the same "corrected" name already exists?

I guess the issue with this is that if we execute the same Jupyter notebook again and expect the job to be reloaded it would give an error message while for any other job name it would work just fine.

pmrv commented 1 year ago

In my preferred solution for that case, the job name is corrected, the warning is emitted and then the job is created with the new name. On the second run, it would again correct the name, issue the warning again and then load the job with that name. That should work no? We can even configure the logger so that this "name change warning" is only emitted once per process.

samwaseda commented 1 year ago

Since we offer the possibility of combining variables to make a job name, it's not compatible with a warning, i.e. something like this should not issue a warning:

job = pr.create.job.MyJob((a, b, c, d))

One compromise I can see is to issue a warning if str is set instead of tuple or list and a conversion takes place, i.e.:

job = pr.create.job.MyJob('dots.are.replaced')  # -> warning
job = pr.create.job.MyJob(('dots', 'are', 'replaced'))  # -> no warning, because this is a feature
pmrv commented 1 year ago

One compromise I can see is to issue a warning if str is set instead of tuple or list and a conversion takes place, i.e.:

Perfectly happy with this, since in the second case equal tuples would be mapped to equal str and the failure case I mentioned cannot happen.

pmrv commented 1 year ago

So, this issue still persists for with renamed jobs that contain a dash. The jobs internally renamed themselves, but the working directory and hdf5 file not. Can we not just allow arbitrary job names and stop this?

pmrv commented 1 year ago

Related bug: master jobs don't seem to check if they create a child job with a name longer than the character limit, which then triggers an error in the database code which kills the whole master.

Traceback:

 Traceback (most recent call last):
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
     self.dialect.do_execute(
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
     cursor.execute(statement, parameters)
 psycopg2.errors.StringDataRightTruncation: value too long for type character varying(50)

 The above exception was the direct cause of the following exception:

 Traceback (most recent call last):
   File "/u/zora/software/pyiron_base/pyiron_base/database/generic.py", line 915, in add_item_dict
     result = self.conn.execute(
   File "/u/zora/software/pyiron_base/pyiron_base/database/generic.py", line 431, in execute
     return retry(
   File "/u/zora/software/pyiron_base/pyiron_base/utils/error.py", line 145, in retry
     return func()
   File "/u/zora/software/pyiron_base/pyiron_base/database/generic.py", line 432, in <lambda>
     lambda: self.execute_once(*args, **kwargs),
   File "/u/zora/software/pyiron_base/pyiron_base/database/generic.py", line 425, in execute_once
     return self._conn.execute(*args, **kwargs)
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/future/engine.py", line 280, in execute
     return self._execute_20(
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20
     return meth(self, args_10style, kwargs_10style, execution_options)
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
     return connection._execute_clauseelement(
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
     ret = self._execute_context(
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
     self._handle_dbapi_exception(
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
     util.raise_(
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
     raise exception
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
     self.dialect.do_execute(
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
     cursor.execute(statement, parameters)
 sqlalchemy.exc.DataError: (psycopg2.errors.StringDataRightTruncation) value too long for type character varying(50)

 [SQL: INSERT INTO jobs_cmmc (parentid, masterid, projectpath, project, job, subjob, chemicalformula, status, hamilton, hamversion, username, computer, timestart) VALUES (%(parentid)s, %(masterid)s, %(projectpath)s, %(project)s, %(job)s, %(subjob)s, %(chemicalformula)s, %(status)s, %(hamilton)s, %(hamversion)s, %(username)s, %(computer)s, %(timestart)s) RETURNING jobs_cmmc.id]
 [parameters: {'parentid': None, 'masterid': 19564337, 'projectpath': '/cmmc/u/', 'project': 'zora/pyiron/projects/2023/MTP_MgAl_V2/verification/volume_flow/mtp/EverythingLayerNNTruncated/MTP24_1_8_6_5_restart_murn_mp_998860_hdf5/', 'job': 'MTP24_1_8_6_5_restart_murn_mp_998860_0_30000000000000004', 'subjob': '/MTP24_1_8_6_5_restart_murn_mp_998860_0_30000000000000004', 'chemicalformula': 'Al', 'status': 'initialized', 'hamilton': 'LammpsMlip', 'hamversion': '2021.09.29_default', 'username': 'zora', 'computer': 'zora@cmti001#1', 'timestart': datetime.datetime(2023, 4, 13, 12, 0, 24, 557393)}]
 (Background on this error at: https://sqlalche.me/e/14/9h9h)

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
     return _run_code(code, main_globals, None,
   File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
     exec(code, run_globals)
   File "/u/zora/software/pyiron_base/pyiron_base/cli/__main__.py", line 3, in <module>
     main()
   File "/u/zora/software/pyiron_base/pyiron_base/cli/control.py", line 59, in main
     args.cli(args)
   File "/u/zora/software/pyiron_base/pyiron_base/cli/wrapper.py", line 31, in main
     job_wrapper_function(
   File "/u/zora/software/pyiron_base/pyiron_base/jobs/job/wrapper.py", line 171, in job_wrapper_function
     job.run()
   File "/u/zora/software/pyiron_base/pyiron_base/jobs/job/wrapper.py", line 124, in run
     self.job.run_static()
   File "/u/zora/software/pyiron_base/pyiron_base/jobs/master/parallel.py", line 544, in run_static
     self._run_if_master_modal_child_non_modal(job=job)
   File "/u/zora/software/pyiron_base/pyiron_base/jobs/master/parallel.py", line 499, in _run_if_master_modal_child_non_modal
     job.save()
   File "/u/zora/software/pyiron_base/pyiron_base/jobs/job/generic.py", line 1084, in save
     job_id = self.project.db.add_item_dict(self.db_entry())
   File "/u/zora/software/pyiron_base/pyiron_base/database/generic.py", line 923, in add_item_dict
     raise ValueError("Error occurred: " + str(except_msg))
 ValueError: Error occurred: (psycopg2.errors.StringDataRightTruncation) value too long for type character varying(50)

 [SQL: INSERT INTO jobs_cmmc (parentid, masterid, projectpath, project, job, subjob, chemicalformula, status, hamilton, hamversion, username, computer, timestart) VALUES (%(parentid)s, %(masterid)s, %(projectpath)s, %(project)s, %(job)s, %(subjob)s, %(chemicalformula)s, %(status)s, %(hamilton)s, %(hamversion)s, %(username)s, %(computer)s, %(timestart)s) RETURNING jobs_cmmc.id]
 [parameters: {'parentid': None, 'masterid': 19564337, 'projectpath': '/cmmc/u/', 'project': 'zora/pyiron/projects/2023/MTP_MgAl_V2/verification/volume_flow/mtp/EverythingLayerNNTruncated/MTP24_1_8_6_5_restart_murn_mp_998860_hdf5/', 'job': 'MTP24_1_8_6_5_restart_murn_mp_998860_0_30000000000000004', 'subjob': '/MTP24_1_8_6_5_restart_murn_mp_998860_0_30000000000000004', 'chemicalformula': 'Al', 'status': 'initialized', 'hamilton': 'LammpsMlip', 'hamversion': '2021.09.29_default', 'username': 'zora', 'computer': 'zora@cmti001#1', 'timestart': datetime.datetime(2023, 4, 13, 12, 0, 24, 557393)}]
 (Background on this error at: https://sqlalche.me/e/14/9h9h)