Closed rlange2 closed 4 years ago
pytest xsimlab --verbose
returns a KeyError for 'intent' and 'description' since I try to access these keys in metadata, see e.g.
intent = str(value.metadata['intent']).split('.')[1].lower()
I'm not 100% sure why this is happening since these should be valid keys of metadata as defined in variable.py
.
Wrapping these expressions in try
and except
seems to solve the problem and leaves me with one remaining failed test, which is:
=================================== FAILURES ===================================
____________________________ test_process_decorator ____________________________
def test_process_decorator():
with pytest.raises(NotImplementedError):
> @xs.process(autodoc=True)
class Dummy:
E Failed: DID NOT RAISE <class 'NotImplementedError'>
xsimlab/tests/test_process.py:210: Failed
I guess that makes sense and shouldn't be an issue, right?
However, using try and except probably isn't the best way to deal with the failed pytest since these are pre-defined keys. I really cannot say if this happens because I hardcoded the keys or it has something to do with the internal logic of pytest or something completely different.
I'd be thankful for ideas.
Okay, in the spirit of encapsulation and OOP, I should start to rely on setters and getters. That said, I changed the assignment for intent and description to:
intent = value.metadata.get('intent').value
description = value.metadata.get('description')
Running pytest
again for now solves the KeyError
. However, it returns:
xsimlab/process.py:458: in render_docstrings
intent = value.metadata.get('intent').value
E AttributeError: 'NoneType' object has no attribute 'value'
I'm curious now, how intent (and equally description) can turn out to be of type None
since it is not a valid VarIntent (and description being None
would raise other errors) and also, how other methods which make use of get
don't run into these problems. Additionally, retrieving the variable type via data_type = str(value.validator).split("'")[1]
does not raise any concerns.
I'm curious now, how intent (and equally description) can turn out to be of type None since it is not a valid VarIntent (and description being None would raise other errors)
value.metadata.get('intent')
returns either an Enum member or None
. Like the error says, None
in Python has no .value
attribute.
Dear @benbovy, many thanks for your input. Most of it seems pretty clear to me. On some others, I'm afraid, I'll need some clarification.
But let's go through that step by step:
I think we should also allow more control on where to include the generated parameter section in the docstrings if it already exists, e.g.,
@xsimlab.process(autodoc=True) class A: """Process A {{parameters}} Notes ------ Some notes about this process. """ ...
Yes, having a keyword that later on gets replaced is a good idea.
I also think we should include other information such as the dimensions, perhaps under the variable description as a bullet list (and maybe move the intent there), e.g.,
a_var: object Variable description - dimensions: scalar or 'x' or ('x', 'y') - intent: 'in' another_var: object ...
That looks better from an organisational perspective. I was wondering what other information are useful to add to the bullet list. The metadata sure offer a lot of them and it's probably a good idea to check beforehand, which items are actually set and others we don't include if they are None
. Although, there is less information available, for instance, if it's a foreign variable. Should we look that up then in the reference class? Then, it might be more sensical, to write little helper functions (in utils.py
?) that, for instance, look up what attributes are given in metadata or retrieve information when there is a foreign variable. Just so, that not everything is done by render_docstring
.
On a more general note, I try to more or less strictly follow the PEP8 conventions for formatting the code. There are tools like flake8 (linters) that can help you enforcing the code into this format. I will eventually use black so that we won't need to worry about code style.
That is a fair point. I just started using flake8 and incorporated their feedback. Will soon take a deeper look into black
Instead of
attr.field_dict
, you should usevariables_dict
defined inutils.py
, since we don't want to includeattrs
attributes that are not xarray-simlab variables here. This is why you get theKeyError: 'intent'
in the tests.
Indeed that was the case. Maybe at some point, we can talk about why that happens.
value.metadata.get('intent')
returns either an Enum member orNone
. Like the error says,None
in Python has no.value
attribute.
Okay, I understand. Then it was probably a logic error from my side. From my understanding value.metadata.get('intent')
will (in this framework) never return None
and therefore, will always have a .value
attribute. Moreover, this also got solved by variables_dict
from utils.py
and I'm not sure what's the reason.
Do not directly overwrite the
__doc__
attribute of the base class here. Instead do:self._p_cls_dict['__doc__'] = ... # concatenate self._base_cls.__doc__ + the formatted parameter section
Here, I run into problems and did so in the past. When I try:
self._p_cls_dict['__doc__'] = ...
The updated docstring doesn't show up in process.__dict__['__doc__']
and can only be seen right at the end when I call help(process)
.
When using instead:
self._p_cls_dict.__doc__ = ...
I get:
AttributeError: 'dict' object attribute '__doc__' is read-only
That's why I started overwriting the base class. I only overwrite, when self._base_cls_.__doc__
is NoneType, i.e. there is no docstring given (from my understanding, and that is only the case for BorderBoundary
in fastscape/processes/boundary.py
plus any user created process without a docstring). Otherwise, I concatenate the docstring to self._base_cls_.__doc__
.
This way, the information shows up in __doc__
as intended.
Thanks for the help.
I was wondering what other information are useful to add to the bullet list. [...] Although, there is less information available, for instance, if it's a foreign variable. Should we look that up then in the reference class? [...] Then, it might be more sensical, to write little helper functions (in utils.py?)
Good point. I think it's enough to show the class and variable to which refers a foreign variable. You could look at formatting.var_details(), which is used to format the docstring of the generated class properties for the variables. Actually, maybe you could just reuse it here?
value.metadata.get('intent') will (in this framework) never return None
It will likely return None
if value
is not an attr.Attribute
created with the framework wrappers (i.e., xs.variable
, xs.foreign
...). The framework should be as less intrusive as possible, i.e., users should be able to do the following:
@xsimlab.process
class A:
not_a_xsimlab_var = attr.ib()
xsimlab_var = xsimlab.variable()
Hence https://github.com/benbovy/xarray-simlab/pull/67#discussion_r353609467
The updated docstring doesn't show up in process.dict['doc'] and can only be seen right at the end when I call help(process).
Oh yes I see, actually the docstrings should be updated for both the base class (stand-alone) and the subclass (the class returned by build_class
).
I was wondering what other information are useful to add to the bullet list. [...] Although, there is less information available, for instance, if it's a foreign variable. Should we look that up then in the reference class? [...] Then, it might be more sensical, to write little helper functions (in utils.py?)
Good point. I think it's enough to show the class and variable to which refers a foreign variable. You could look at formatting.var_details(), which is used to format the docstring of the generated class properties for the variables. Actually, maybe you could just reuse it here?
Reusing formatting.var_details()
would keep everything consistent as well. However, the function returns the output in a very defined format. In order to account for additional indents and newline characters, I came up with two different approaches:
var_details()
itself but rather its output:Here, a nested loop is introduced to format the output on the fly:
def render_docstrings(self):
attributes_keyword = "{{attributes}}"
data_type = "object" # placeholder until issue #34 is solved
docstring = "\nAttributes\n----------\n"
indent = " "
for key, value in variables_dict(self._base_cls).items():
temp_string = ''
var_attributes = var_details(value).split("\n")
for line in range(len(var_attributes)):
if line == len(var_attributes) - 1:
temp_string += indent + var_attributes[line]
else:
temp_string += indent + var_attributes[line] + "\n"
var_attributes = temp_string
docstring += f"{key}: {data_type}\n{var_attributes}\n"
if self._base_cls.__doc__ is not None:
if attributes_keyword in self._base_cls.__doc__:
self._base_cls.__doc__ = self._base_cls.__doc__.replace(attributes_keyword,
docstring)
else:
self._base_cls.__doc__ += f"\n{docstring}"
else:
self._base_cls.__doc__ = docstring
False
) to var_details()
to return expected output:def var_details(var, docstring=False):
max_line_length = 70
var_metadata = var.metadata.copy()
description = textwrap.fill(var_metadata.pop('description').capitalize(),
width=max_line_length, subsequent_indent=' ')
detail_items = [('type', var_metadata.pop('var_type').value),
('intent', var_metadata.pop('intent').value)]
detail_items += list(var_metadata.items())
if docstring==True:
details = "\n".join([" - {} : {}".format(k, v) for k, v in detail_items])
else:
details = "\n".join(["- {} : {}".format(k, v) for k, v in detail_items])
return description + "\n\n" + details + '\n'
This change let's us get rid of the second loop in render_docstrings()
:
def render_docstrings(self):
attributes_keyword = "{{attributes}}"
data_type = "object" # placeholder until issue #34 is solved
docstring = "\nAttributes\n----------\n"
indent = " "
for key, value in variables_dict(self._base_cls).items():
var_attributes = var_details(value, docstring=True)
docstring += f"{key}: {data_type}\n{indent}{var_attributes}\n"
if self._base_cls.__doc__ is not None:
if attributes_keyword in self._base_cls.__doc__:
self._base_cls.__doc__ = self._base_cls.__doc__.replace(attributes_keyword,
docstring)
else:
self._base_cls.__doc__ += docstring
else:
self._base_cls.__doc__ = docstring
There's no need anymore to import textwrap
.
Both approaches generate the same result, for example, by using the process mentioned in #67 (comment):
Mimics the effect on erosion rates of a dyke dipping at
a given angle, that is buried beneath the landscape and that is
progressively exhumed by erosion.
Attributes
----------
x_position: object
Initial x-position of exposed dyke
- type : variable
- intent : in
- dims : ((),)
- group : None
- attrs : {}
width: object
Dyke fixed width
- type : variable
- intent : in
- dims : ((),)
- group : None
- attrs : {}
angle: object
Dyke dipping angle in degrees
- type : variable
- intent : in
- dims : ((),)
- group : None
- attrs : {}
grid_shape: object
Reference to variable 'shape' defined in class
'uniformrectilineargrid2d'
- type : foreign
- intent : in
- other_process_cls : <class 'fastscape.processes.grid.UniformRectilinearGrid2D'>
- var_name : shape
x: object
Reference to variable 'x' defined in class 'uniformrectilineargrid2d'
- type : foreign
- intent : in
- other_process_cls : <class 'fastscape.processes.grid.UniformRectilinearGrid2D'>
- var_name : x
etot: object
Reference to variable 'cumulative_height' defined in class
'totalerosion'
- type : foreign
- intent : in
- other_process_cls : <class 'fastscape.processes.erosion.TotalErosion'>
- var_name : cumulative_height
k_coef: object
Reference to variable 'k_coef' defined in class 'streampowerchannel'
- type : foreign
- intent : out
- other_process_cls : <class 'fastscape.processes.channel.StreamPowerChannel'>
- var_name : k_coef
diffusivity: object
Reference to variable 'diffusivity' defined in class 'lineardiffusion'
- type : foreign
- intent : out
- other_process_cls : <class 'fastscape.processes.hillslope.LinearDiffusion'>
- var_name : diffusivity
A few things I noticed:
Due to capitalize()
in var_details()
any character other than the first in the string becomes lowercase, e.g. 'UniformRectilinearGrid2D'
turns into 'uniformrectilineargrid2d'
. It's not the biggest issue right now but it lowers readability quite a bit in my opinion.
Additionally, some bullet points end up containing redundant information (e.g. other_process_cls
for foreign variables), others are even less helpful if not assigned (e.g. group
, dims
, attrs
). Of course, that doesn't mean, we should leave these out.
When no description is given, there will be an additional blank line instead.
Finally, when using a non xsimlab-variable via attr.ib()
it likely won't be considered by variables_dict()
since they are not xsimlab-specific.
I prefer the 1st approach. It's better customize things at a higher level and not propagate complexity at lower levels (e.g., var_details
in this case), IMO.
I think you could get rid of the nested loop by using textwrap.indent
, and actually get rid of all nested blocks, e.g., with something like:
def render_docstrings(self):
attributes_keyword = "{{attributes}}"
data_type = "object" # placeholder until issue #34 is solved
fmt_vars = []
for vname, var in variables_dict(self._base_cls).items():
var_header = f"{vname} : {data_type}"
var_content = textwrap.indent(var_details(var), " " * 4)
fmt_vars.append(f"{var_header}\n{var_content}")
fmt_section = textwrap.indent("Attributes\n"
"----------\n"
"\n".join(fmt_vars),
" " * 4)
current_doc = self._base_cls.__doc__ or ""
if attributes_keyword in current_doc:
new_doc = current_doc.replace(attributes_keyword,
fmt_section[4:])
else:
new_doc = f"\n\n{fmt_section}\n"
self._base_cls.__doc__ = new_doc
The code here above should also properly handle all line returns and indentation.
Due to capitalize() in var_details() any character other than the first in the string becomes lowercase, e.g. 'UniformRectilinearGrid2D' turns into 'uniformrectilineargrid2d'.
You can ignore this for now. Actually, I think that it will be better if the description of a foreign variable corresponds to the description of the variable it refers to (we can do this in another PR).
That is a really clean and thorough approach!
I slightly changed your code to the following:
def render_docstrings(self):
attributes_keyword = "{{attributes}}"
data_type = "object" # placeholder until issue #34 is solved
fmt_vars = []
for vname, var in variables_dict(self._base_cls).items():
var_header = f"{vname} : {data_type}"
var_content = textwrap.indent(var_details(var), " " * 4)
fmt_vars.append(f"{var_header}\n{var_content}")
fmt_section = textwrap.indent("Attributes\n"
"----------\n"
+ "\n".join(fmt_vars),
" " * 4)
current_doc = self._base_cls.__doc__ or ""
if attributes_keyword in current_doc:
new_doc = current_doc.replace(attributes_keyword,
fmt_section[4:])
else:
new_doc = f"{current_doc}\n{fmt_section}\n"
self._base_cls.__doc__ = new_doc
+
was added to your second textwrap.indent()
otherwise the attributes headline is wrapped around each variable instead of being placed once at the beginning of fmt_section
. Line break was introduced before the binary operator for readability.else
since a previously existing docstring (without using the attributes_keyword
) would have been overwritten.f
to your f-string.I would like to know what you think about setting autodoc=True
in process.process()
by default. Also, there is still a blank line in case no variable description is given. I will think about what's a good way to handle that.
You can ignore this for now. Actually, I think that it will be better if the description of a foreign variable corresponds to the description of the variable it refers to (we can do this in another PR).
True. I'm not sure how to link a variable in a process to its original process. Will think about it.
Edit: Maybe var.metadata.get('other_process_cls')
is a good start.
For better overall readability, I'm wondering if we shouldn't move the the formatting logic currently in render_docstrings
into a new function in formatting.py
, e.g., format_attribute_section(process)
and then just call it in render_docstrings
:
from .formatting import format_attribute_section
def render_docstrings(self):
new_doc = format_attribute_section(self._base_cls)
self._base_cls.__doc__ = new_doc
self._p_cls_dict['__doc__'] = new_doc
The last line is needed (I think) to be able to get the docstrings from process instances attached to a model (i.e., mymodel.myprocess
).
We should also take the indentation into account when wrapping the variable description as a text block in the section. This would need exposing the max line length as a parameter, i.e., var_details(var, max_line_length=70)
and then use 62 instead of 70 for var_content
in the code above.
Nit: add_attribute_section(process, placeholder="{{attributes}}")
is slightly better than format_attribute_section(process)
IMO.
add_attribute_section(process, placeholder="{{attributes}}")
has been added to formatting.py
and the code has been adjusted as you suggested.
I pushed the update since I felt that it's good to have a more recent version to talk about. Not sure why the checks have failed since pytest
didn't have any issues.
I will look into the self._p_cls_dict['__doc__'] = new_doc
assignment and max line length.
The errors on travis are unrelated. I'll remove python 3.5 from the test matrix and fix the issue with the latest xarray release in another PR.
That said, it would be good to add a test for this "autodoc" feature.
I would like to know what you think about setting autodoc=True in process.process() by default.
It's a good default IMO. More descriptive docstrings is good. autodoc=False
is for edge cases or in case something goes wrong.
Also, there is still a blank line in case no variable description is given. I will think about what's a good way to handle that.
I like adding (no description given)
as you proposed it in an earlier version. You could do that in var_details()
.
The errors on travis are unrelated. I'll remove python 3.5 from the test matrix and fix the issue with the latest xarray release in another PR.
This is now fixed in the master branch. You can merge it here.
self._p_cls_dict['__doc__'] = new_doc
The last line is needed (I think) to be able to get the docstrings from process instances attached to a model (i.e.,
mymodel.myprocess
).
I'm not sure about this, e.g.
from fastscape.processes import basic_model
basic_model.terrain
It returns the information with the previous formatting, i.e.
<TerrainDerivatives 'terrain' (xsimlab process)>
Variables:
shape [in] <--- grid.shape
spacing [in] <--- grid.spacing
elevation [in] <--- topography.elevation
slope [out] ('y', 'x') terrain local slope
curvature [out] ('y', 'x') terrain local curvature
Simulation stages:
*no stage implemented*
The same is the case for a custom process, see DippingDyke example from the first post:
from fastscape.models import basic_model
basic_model = basic_model.update_processes({'Dyke':DippingDyke})
basic_mode.Dyke
returns
<DippingDyke 'Dyke' (xsimlab process)>
Variables:
x_position [in] initial x-position of exposed dyke
width [in] dyke fixed width
angle [in] dyke dipping angle in degrees
grid_shape [in] <--- grid.shape
x [in] <--- grid.x
etot [in] <--- erosion.cumulative_height
k_coef [out] ---> spl.k_coef
diffusivity [out] ---> diffusion.diffusivity
Simulation stages:
run_step
That is the case, whether I include self._p_cls_dict['__doc__'] = new_doc
or not.
It makes sense. formatting._summarize_var()
is called in formatting.repr_process()
which sets up the layout. That in turn gets passed to the process._ProcessBuilder
class as well as to process.process_info()
.
Maybe I misunderstood your intention but this is what I found regarding mymodell.myprocess
.
We should also take the indentation into account when wrapping the variable description as a text block in the section. This would need exposing the max line length as a parameter, i.e.,
var_details(var, max_line_length=70)
and then use 62 instead of 70 forvar_content
in the code above.
For the current format, it seems like there are mostly two places where text length might be a concern. That would be the i) variable description and ii) the bullet point other_process_cls
(appears for foreign variables) will usually contain a long sequence, e.g. <class 'fastscape.processes.grid.UniformRectilinearGrid2D'>
.
With the way var_details()
is written now, a line break is only introduced in the description if max_line_length is exceeded, since this is where textwrap.fill
is applied. So this won't apply for the bulleted list.
There are a couple of possibilities I can think of:
textwrap
method around around var_content
or fmt_section
.var_details()
once again and include a textwrap
(or custom formatting) around details
.var_details()
but only used for this specific case of docstring-formatting.Feel free to share your thoughts and add to the list.
My thoughts: Regarding 1., textwrap
can mess with a text block quite substantially. However, I will try to make that work. I can see, how 2. might not be the best option. var_details()
is used in a few other places and we probably should keep this function as original as it is meant to be. 3. introduces another function to the project. I think, that is definitely an option. 4. sounds a bit dissatisfying at first but there are two things, we should keep in mind: a) We can leave line breaks for the description up to the user, and b) we have yet to think about a way how to deal with the variable information of foreign variables. I'm sure you want to include what process the variable is referencing (as a bullet point in this list), but an output like
- other_process_cls : <class 'fastscape.processes.grid.UniformRectilinearGrid2D'>
is much too cryptic in my opinion anyways. Of course, we can wrap the description with max_line_length=62
and replace other_process_cls
with a wrapped expression as well.
The errors on travis are unrelated. I'll remove python 3.5 from the test matrix and fix the issue with the latest xarray release in another PR.
That said, it would be good to add a test for this "autodoc" feature.
That sounds good. I'm not yet sure how to properly customise your own build but I hope, we can discuss it at one point.
It's a good default IMO. More descriptive docstrings is good.
autodoc=False
is for edge cases or in case something goes wrong.
Great!
Also, there is still a blank line in case no variable description is given. I will think about what's a good way to handle that.
I like adding
(no description given)
as you proposed it in an earlier version. You could do that invar_details()
.
The formatting.var_details()
now contains:
description = textwrap.fill(var_metadata.pop('description').capitalize(),
width=max_line_length) or (
"(no description given)")
Your suggestion inspired me to use or
in string assignment. I hope, it doesn't affect readability too much compared to a comprehensive if/else
(PEP 308 -- Conditional Expressions).
Seems to do the job for now.
I tried to introduce horizontal rulers to structure my posts a bit more. If I should make separate posts, please don't hesitate to mention it.
The errors on travis are unrelated. I'll remove python 3.5 from the test matrix and fix the issue with the latest xarray release in another PR.
This is now fixed in the master branch. You can merge it here.
I hope, I did the right thing :) The pytest
output is a bit different now but no complains so far.
It returns the information with the previous formatting
basic_model.terrain
gives what the class __repr__
returns (that's what xs.process_info(basic_model.terrain)
also returns), which is not the same than the docstrings. Try help(basic_model.terrain)
and you should see the formatted attributes section among many other things.
For the current format, it seems like there are mostly two places where text length might be a concern. That would be the i) variable description and ii) the bullet point other_process_cls (appears for foreign variables) will usually contain a long sequence.
Yeah I think it's enough for now to wrap only the description (with 70 - 8 char width in this case) and let the bullet list untouched.
it would be good to add a test for this "autodoc" feature [...] I'm not yet sure how to properly customise your own build
The test would be very similar to those already implemented in https://github.com/benbovy/xarray-simlab/blob/master/xsimlab/tests/test_formatting.py.
Your suggestion inspired me to use or in string assignment.
Actually, I prefer a if
in this case.
I tried to introduce horizontal rulers to structure my posts a bit more. If I should make separate posts, please don't hesitate to mention it.
Sometimes it's good to push your last changes/commits first and then add yourself inline comments, so that we can discuss specific details just next to the relevant lines of code.
I hope, I did the right thing :) The pytest output is a bit different now but no complains so far.
Same advise, don't hesitate to push your commits early. This way I can directly see the output in the CI logs.
@rlange2 I started using black in #84. As a consequence, there are some conflicts between this branch and the master branch. Let me know if you have some work that you haven't commit/pushed yet. Otherwise, I'll resolve the conflicts through this interface and you'll just need to pull the changes.
From now on, you can stop worrying about code formatting and use black: see here.
Latest changes include:
test_formatting.test.add_attribute_section()
and test_process.test_process_decorator()
black . && flake8
whats-new.rst
for all changes@rlange2 I started using black in #84. As a consequence, there are some conflicts between this branch and the master branch. Let me know if you have some work that you haven't commit/pushed yet. Otherwise, I'll resolve the conflicts through this interface and you'll just need to pull the changes.
I created a new branch and used it to overwrite the existing autodoc-branch. I hope that wasn't the worst idea. Should I highlight the changes I made?
Great! Thanks @rlange2
Will update the
doc
attribute of each process in the framework.autodoc=True
does not need to be passed to the xs.process decorator and instead is set globally.I tried to follow the numpy docstring guide as closely as possible. Basically, variable name, type (if supplied), intent and description will be taken into account and represented in the format:
For example, the user defined process:
Will return: