open-research / sumatra

http://neuralensemble.org/sumatra/
BSD 2-Clause "Simplified" License
127 stars 48 forks source link

Fixed issue where Py2.7 strings are not recognised as strings. #315

Closed dpad closed 3 months ago

dpad commented 8 years ago

Another pull request just in case it may be useful to the devs. I have not exhaustively tested it. There was an issue with Py2.7 strings not being allowed into SimpleParameterSets because the future builtins (from Python3+) redefine str as a new unicode type, so the isinstance(val, str) wasn't valid.

timtroendle commented 8 years ago

The idea of the Py2/Py3 compatibility layer sumatra uses is to not have Py2 string in use at all anymore; because they are fundamentally different from Py3 strings. Instead, all Py2 strings coming in from other libraries should be wrapped by the newstr type.

Apparently in your case that wrapping is not done leading to the bug you are trying to fix. My suggestion would be to undo your changes and wrap the string by the newstr type as early as possible. If you give me an error description or minimal working example, I'll be able to assist you in that.

dpad commented 8 years ago

Thanks Tim.

This was mainly a change to get things to work out of the box from my Py2-only workflow. I was using the "capture_args" decorator from this other pull request (https://github.com/open-research/sumatra/pull/313) to package the arguments of the decorated function into a SimpleParameterSet; if one if the arguments passed in was a simple (Py2) string, then it would break.

In other words, a minimum example would be as follows:

from sumatra.decorators import SimpleParameterSet
param_set = SimpleParameterSet({"parameter_name": "parameter value"})

If this fix doesn't fit with the philosophy for Sumatra, please feel free to close this :)

maxalbert commented 8 years ago

@timtroendle I haven't had time to look into this at all, but just to note that I did experience similar situations in the past to the one that @dpad encountered, where the compatibility that was supposed to be provided by the future package didn't quite work. I wonder whether we're missing some imports in some files in Sumatra which introduces these kinds of bugs? I'm just guessing, though, it could be a completely different issue. But we should look into this more carefully because it seems to pop up in different circumstances and it would be a shame if something that is supposed to make Sumatra more robust actually makes it more brittle.

timtroendle commented 8 years ago

@maxalbert, as I'm not using Python2 I am not aware of any known problems. I would suggest to open an issue for every known one. I do agree that Sumatra should be as robust as possible when it comes to string handling. So if there is was something systematically wrong as you are suspecting, that should be tackled.

From @dpad's minimal example I fear there is a systematical problem: his problem arises when using the Sumatra API not the CLI or web interface. Coming back to what I said earlier: my strategy was to wrap incoming strings into the newstr type / unicodeas early as possible. To me that meant wrapping them when they come in from the command line. Where ever we are in the codebase, we can then rely on the fact that we have a type that behaves like the Py3 str. The issue @dpad is putting light on is the system border on which these Py2 str are 'incoming'. So far, we only considered the CLI / web interface as the system border. When using the API though, every public class is its own system and every public method of that class can potentially receive Py2 str. That is currently not safe and in particular it is not tested. It also causes @dpad's problem.

So there are two options I see to tackle that problem in a systematic way:

  1. Continue with the strategy of wrapping everything into the newstr type / unicode as early as possible but considering the API as the system border. That would mean touching tests of each public class or function and wrapping arguments of public methods. Quite some work I suppose.
  2. Dropping the strategy of wrapping everything into the newstr type / unicode. Accepting the fact that there are Py2 strings floating through the library. Adapting isinstance checks as done by @dpad in this PR. I am not fully aware of the consequences but it would certainly be a quick fix to some problems. Still, to have a proper test coverage, tests would need to be changed (all public methods/functions should be tested with Py2 str and newstr / unicode).

Whether either one shall be done and whether and how @dpad's problem could be solved in a short-term is up to you I guess. If needed I am open to assist in improving the current situation.

timtroendle commented 8 years ago

One addition to make the problem a little bit more clear:

We have a test that covers @dpad's use case. It's called test__init__should_accept_dict and is in line 97 of test_parameters.py. As we have imported from __future__ import unicode_literals all strings in this test are of type unicode. To cover the bug at hand, we would need to have the exact test again, but with Py2 str type.