quaquel / EMAworkbench

workbench for performing exploratory modeling and analysis
BSD 3-Clause "New" or "Revised" License
128 stars 90 forks source link

Fix for #274 Make all Variable subclasses name valid python identifiers #307

Open quaquel opened 1 year ago

quaquel commented 1 year ago

As expected, this brakes a lot of stuff. I will start fixing all errors this creates over the coming days.

EwoutH commented 1 year ago

Right, I just now understand the implications of this, after seeing all the examples with names with spaces in them which now aren't allowed.

So this is definitely a 3.0 change, and we should add deprecation warnings in the next release (2.5).

Since the scope in which backwards compatibility is broken is huge, I'm also not sure if we really should do this for all internal variable classes.

coveralls commented 1 year ago

Coverage Status

coverage: 79.638% (-0.7%) from 80.335% when pulling d8b9b082fa7198a8c7b4582ecd4d1c47049d78e3 on valid_identifiers into f3c96745f706417ababd6c1f315b7be6c71eaab1 on master.

quaquel commented 1 year ago

All tests now pass, so this is ready for review.

We have to decide also on how to handle 2.5 and 3. I agree that it would be good to keep this for a 3.0 release, but what to still do for 2.5?

EwoutH commented 1 year ago

For 2.5, I would say replace all Error with DeprecationWarning, including that we’re dropping it in 3.0.

I will try to review asap.

quaquel commented 1 year ago

Ok, so we need a 3.0 branch into which this can go.

I'll probably make a separate pull request for the deprecation warnings

EwoutH commented 1 year ago

I think we can leave this PR open, merge the warnings in a separate PR, release 2.5, and then directly after that merge this PR to master.

If we need to do 2.5.x bugfixes we can do those on a maintenance branch.

EwoutH commented 11 months ago

I was thinking, can we provide a minimal, best-effort attempt to convert invalid identifiers to valid ones? A very simple option would be to convert spaces to underscores. Or, we could just convert any invalid character to an underscore.

import re

def to_valid_identifier(s):
    # Replace invalid characters with an underscore
    s = re.sub(r'[^0-9a-zA-Z_]', '_', s)

    # Ensure the first character is not a digit
    if s[0].isdigit():
        s = "_" + s

    return s

I think in that case we still want to throw an warning, like:

UserWarning: f"Invalid characters found in {variable_name}. Converting all characters to underscores: {converted_variable_name} is now a valid Python identifier."
quaquel commented 11 months ago

I am hesitant about this: when in doubt, refuse the temptation to guess.

Also, by default parameter names map to something in the underlying model. So if you change the name, you also need to set the variable_name kwarg to the original name.

EwoutH commented 11 months ago

Yeah I understand. How would it be as an option that's off by default?

I'm just trying to think about ways for users with huge Vensim models to not require major modifications.

Otherwise we could suggest a valid name in the error message:

if not name.isidentifier():
    raise ValueError(f"'{name}' is not a valid Python identifier. Changing it to '{to_valid_identifier(name)'} would make it one.")
quaquel commented 11 months ago

It would be nice to have a convenient way of dealing with this, but I need to figure out how to go about it.

The idea of automagically changing parameter names violates a classic Python principle. So, for me, that is a no-go.

An alternative would be adding a helper function to 2.5 to convert models to the new behavior. This helper function would try to fix all parameters and spit out not just a fixed model but also the revised code so you can replace the existing uncertainty/constant/outcome definitions. The problem is that this function is easy for 95% of all use cases: take the parameter name, split it on whitespace, and join it with an underscore. Set the parameter name to this new name and set the variable name to the original name. You would print this out with __repr__ on the parameter.

There are, however, several issues. First, variable_name may be already used. This is easy: update the name only. The problem is with the remaining cases where, for example, alternative distributions are being used. Or when some of the other optional kwargs have been used to instantiate the parameter instance. These are currently not well covered by __repr__, so spitting out the new correct code is a problem.

Suggesting correct names in the error message won't help much. The fix will be trivially obvious, but is just painful to do. Copy-pasting between the console and the model definition is even more painful than just spending 10 minutes fixing all names.

EwoutH commented 11 months ago

Good points. I would just add the warnings to 2.5, then users know a change is coming, and then we have some more time to think about how to handle it exactly.

quaquel commented 11 months ago

Having a fix function would need to happen in 2.5 because, from 3 onward, you get a straight-up exception (at least atm).

EwoutH commented 11 months ago

Fair, but I wouldn't let it be a blocker for 2.5.0. We can always backport to a 2.5 release branch and do a 2.5.1 path release with it.

EwoutH commented 8 months ago

With 2.5 released we can merge this to the master branch (targeted for 3.0) right?

quaquel commented 6 months ago

I'll fix the conflicts hopefully later today and merge it