Open jonmaddock opened 1 year ago
added 2 commits
added 3 commits
added 1 commit
added 1 commit
added 2 commits
@timothy-nunn I'd be grateful for your review. Thanks!
added 2 commits
In GitLab by @timothy-nunn on Apr 20, 2023, 15:43
Commented on requirements.txt line 26
Probably want to add this to the setup.py
requirements too?
In GitLab by @timothy-nunn on Apr 20, 2023, 16:33
Commented on process/io/plot_solutions.py line 376
I think https://docs.python.org/3/library/dataclasses.html#dataclasses.asdict is the "accepted" way of doing this, although that shouldn't matter here. Generally though, I am not a huge fan or raw accessing dunder methods/attributes.
In GitLab by @timothy-nunn on Apr 20, 2023, 16:35
Commented on process/data/scenarios line 1
I'm not sure I see the purpose of this file
In GitLab by @timothy-nunn on Apr 20, 2023, 17:17
Commented on process/io/plot_solutions.py line 453
This section can be simplified by adding the var_names
to the regex string. This can be done as follows, although this looks a bit hacky so we might want to find another solution:
# Filter in tag by default
# TODO Make tag in default df.filter
if var_names is None:
var_names = [TAG]
# Filter for optimisation parameters (normalised to initial value
# e.g. xcm001) values and names, objective function value and name
filtered_results = results_df.filter(
regex=f"{NORM_OPT_PARAM_VALUE_PATTERN}|{NORM_OPT_PARAM_NAME_PATTERN}|{NORM_OBJF_PATTERN}"
+ f'|{"|".join(var_names)}'.rstrip("|")
)
In GitLab by @timothy-nunn on Apr 20, 2023, 18:06
Commented on process/io/plot_solutions.py line 425
A few things:
filter
s. See below.pct_change
method (sadly makes the first row NaNs, hence the .fillna(0)
).from pandas import DataFrame, concat
data = [
{"name": "entry1", "val1": 22.0, "val2": 1.0},
{"name": "entry2", "val1": 44.0, "val2": 0.5},
]
columns_to_normalise = ["val1", "val2"]
df = DataFrame(data)
print(df)
df_to_normalise = df.filter(items=columns_to_normalise, axis="columns")
df_normalised = df_to_normalise.pct_change().fillna(0)
df_rest_of = df.filter(items=df.columns.difference(df_normalised.columns))
df_full_normalised = concat([df_normalised, df_rest_of], axis="columns")
print()
print(df_full_normalised)
In GitLab by @timothy-nunn on Apr 20, 2023, 18:07
Half way through, will continue tomorrow/next week!
changed this line in version 12 of the diff
added 4 commits
Agreed, done.
Quite right; I've corrected this.
It's a symbolic link to the tests/regression/scenarios
dir. This allows the test data outside the package dir to be packaged as package_data
with process
(as plot_solvers
can require it), whilst avoiding keeping two copies of the data (one inside the package, one in tests
) and keeping them in sync. Check out the commit message for this to clarify.
Agreed, much better to use a single filter step rather than concatenating two. I can't think of a neater way of doing this: we need to filter a load of different variable names, so the regex is going to look chunky. We could pass a regex string to _filter_vars_of_interest
instead of the var_names
list, but I'm not sure that's much better. I've made the change to this approach.
Thanks for pct_change()
, I didn't know about that.
By indexing, do you mean using loc
? Why is indexing dataframes considered bad practice? It's used extensively in pandas own tutorial. Maybe I've missed something.
pct_change()
is convenient, but we won't want to plot the NaN/0 row that this creates, so won't we need to exclude that row by indexing anyway? Also, I can't quite see how pct_change()
could provide the percentage change relative to the first row only: it seems to provide a rolling percentage change between the current value and a fixed interval in the past, as it helpful in time-series data. We always want to compare percentage changes to the same row. For example, when we have 3 mfiles, mfiles 2 and 3 want to be normalised relative to mfile 1. I'm not sure I see how this would do that.
In GitLab by @timothy-nunn on Apr 24, 2023, 11:10
Commented on process/io/plot_solutions.py line 425
loc
and iloc
are always things ive been told to avoid because its generally only done in iterations (which are really slow in pandas). Generally, there are more obvious ways of indexing e.g filtering in the case of .loc[:, ...]
. although my observations above have maybe changed my mind, I'm not sure indexing by a boolean array will be any more obvious with filter, it would just achieve it in fewer lines
In the case we have multiple mfiles (something I imagine I will realise as I continue this review) pct_change
will probably, you are right, not work very well. It just strikes me as something that should be possible in a much easier way--surely normalizing a dataframe to a row is pretty common, right?
In GitLab by @timothy-nunn on Apr 26, 2023, 10:01
Commented on process/data/scenarios line 1
So when the package is built it will copy the data (because of this line "process.data": ["scenarios/*/*"],
) and the data will be available at process/data/scenarios/<regression test>/IN.DAT
, correct?
In GitLab by @timothy-nunn on Apr 26, 2023, 10:09
Commented on process/io/plot_solutions.py line 475
Whilst I agree I am also willing to concede that doing so over so few rows probably does not matter
In GitLab by @timothy-nunn on Apr 26, 2023, 10:27
Commented on process/io/plot_solutions.py line 411
Why do we not need an _np
version of this row?
In GitLab by @timothy-nunn on Apr 26, 2023, 10:28
Commented on process/io/plot_solutions.py line 395
iloc
might be better here because we genuinely want the first row
In GitLab by @timothy-nunn on Apr 26, 2023, 11:10
Y-axis of the plots should probably be labeled with something more descriptive than "Value"
In GitLab by @timothy-nunn on Apr 26, 2023, 11:18
Hi Jon,
Other than the questions/comments above I think this is ready to go in. The plots it produces are valuable and working. The problem is not a typical use of pandas which might be the reason looping is required, I suggest we don't get hung up on this as we are dealing with small amounts of data, so speed will not be an issue.
Tim
Merges 1823-add-solution-comparison-tool -> develop
Closes #1823