Water analysis Pie_chart_analysis v0.7 (in WIPScripts)

Ryton commented 9 years ago

Lets use this issue to discuss the WATER analysis notebook.

saroele commented 9 years ago

ok! For future, I propose the following work flow:

create an issue
create a branch starting from develop and called issueXX_DescriptionOfMyIssue (where XX is this issue number)
when you think of it, always refer to this issue in your commit message with #XX
when the work is ready to be tested, create a pull request

Ryton commented 9 years ago

Makes sense, apart from the very long branch name, which seems a bit unwieldy to me. What if your branch tries to resolve multiple related issues?

PS: Should a branch contain only the relevant files or (in an SVN manner) all files in the repository?

saroele commented 9 years ago

When you create a branch, it always descents from a 'parent' commit. Your branch will track the changes compared to that parent. So all files that are in the parent are present in your branch.

In the proposed collaboration work flow, a branch is linked to a single issue. However, a commit can be linked to several issues (by adding #xx in the commit message for every issue number xx you want to refer to). And small bugfixes can happen directly in the develop branch of course.

Let's try it this way, we'll evaluate when needed

On Sun, Nov 16, 2014 at 10:09 PM, Ryton notifications@github.com wrote:

Makes sense, apart from the very long branch name, that seems a bit weird to me (what if your branch links to several issues)?

Btw: should a branch contain only the relevant files or (in an SVN manner) all files in the repository?

— Reply to this email directly or view it on GitHub https://github.com/opengridcc/opengrid/issues/23#issuecomment-63239407.

Ryton commented 9 years ago

Developed this script (and some additions to plotting library) a bit further, see next commit this weekend.

Bumped into a major issue though: Combining two timeseries with combine first works if there are no duplicates in the first timeseries, but fails otherwise (see full log below). @saroele: Is there a cleaner/ better workaround for this, or a way to remove duplicates in a pandas timeseries easily?

series_with_crossing = TS1.combine_first(TS2) On the other hand, first concatenatingand then selecting them, results in many NaN values. df_combined=pd.concat([series, cross_ts], axis=1) series_with_crossing = df_combined[0].dropna()

---> 70 ts = tsLc.combine_first(tsHc) 71 print ts

C:\winpython\WinPython-32bit-2.7.6.2\python-2.7.6\lib\site-packages\pandas\core\series.pyc in combine_first(self, other) 1622 """ 1623 new_index = self.index + other.index -> 1624 this = self.reindex(new_index, copy=False) 1625 other = other.reindex(new_index, copy=False) 1626 name = _maybe_match_name(self, other)

C:\winpython\WinPython-32bit-2.7.6.2\python-2.7.6\lib\site-packages\pandas\core\series.pyc in reindex(self, index, _kwargs) 2056 @Appender(generic._shared_docs['reindex'] % _shared_doc_kwargs) 2057 def reindex(self, index=None, _kwargs): -> 2058 return super(Series, self).reindex(index=index, _kwargs) 2059 2060 def reindex_axis(self, labels, axis=0, _kwargs):

C:\winpython\WinPython-32bit-2.7.6.2\python-2.7.6\lib\site-packages\pandas\core\generic.pyc in reindex(self, _args, *_kwargs) 1363 return self._reindex_axes(axes, level, limit, 1364 method, fill_value, copy, -> 1365 takeable=takeable).finalize(self) 1366 1367 def _reindex_axes(self, axes, level, limit, method, fill_value, copy,

C:\winpython\WinPython-32bit-2.7.6.2\python-2.7.6\lib\site-packages\pandas\core\generic.pyc in _reindex_axes(self, axes, level, limit, method, fill_value, copy, takeable) 1400 obj = obj._reindex_with_indexers( 1401 {axis: [new_index, indexer]}, method=method, -> 1402 fill_value=fill_value, limit=limit, copy=copy) 1403 1404 return obj

C:\winpython\WinPython-32bit-2.7.6.2\python-2.7.6\lib\site-packages\pandas\core\generic.pyc in _reindex_with_indexers(self, reindexers, method, fill_value, limit, copy, allow_dups) 1489 new_data = new_data.reindex_indexer(index, indexer, axis=baxis, 1490 fill_value=fill_value, -> 1491 allow_dups=allow_dups) 1492 1493 elif (baxis == 0 and index is not None and

C:\winpython\WinPython-32bit-2.7.6.2\python-2.7.6\lib\site-packages\pandas\core\internals.pyc in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups) 3118 # trying to reindex on an axis with duplicates 3119 if not allow_dups and not self.axes[axis].is_unique: -> 3120 raise ValueError("cannot reindex from a duplicate axis") 3121 3122 if axis == 0:

ValueError: cannot reindex from a duplicate axis

saroele commented 9 years ago

There is also a join() operator which may be what we need. I can have a quick look next week.

btw: we quickly looked throug the piechart notebook this Tuesday to decide on which graphs to put online. I think there are some good analyses there, so if we can finalize them they could be online pretty soon :-)

On Fri, Dec 19, 2014 at 12:04 AM, Ryton notifications@github.com wrote:

Developed this script (and some additions to plotting library) a bit further, see next commit this weekend.

Bumped into a major issue though: Combining two timeseries with combine first works if there are no duplicates in te, but fails otherwise.

series_with_crossing = cross_ts.combine_first(series) On the other hand, first concatenatingand then selecting them, results in many NaN values. df_combined=pd.concat([series, cross_ts], axis=1) series_with_crossing = df_combined[0].dropna() Is there a cleaner/ better workaround for this?

— Reply to this email directly or view it on GitHub https://github.com/opengridcc/opengrid/issues/23#issuecomment-67572921.

Ryton commented 9 years ago

Yippie, found a fix/solution!

Error was caused by a similar, duplicate row occuring in both of the timeseries...

It was solved by removing duplicates first with a neat little trick:

def remove_duplicates(ts): ''' Hack to remove duplicate date entries. Keeps the 'first' entry in the timeseries Source: http://www.asktheguru.info/kb/viewanswer/29968238/ ''' return pd.TimeSeries(ts.to_dict())

For some reason, if you combine two timeseries on themselves, no error was given despite the duplicate rows.

Ryton commented 8 years ago

Cool, it's part of the new milestone!

When will the sprint take place?

ok, DL 22/03 it seems.

Then in the mean while, I'd love to add some more love to my historic coding attemts (e.g. add some tests ;-) )!

saroele commented 8 years ago

Actually, we still have to decide on the sprint, but this one seemed like a good candidate. I want to work on this with you. First we have to adapt the existing code to the new opengrid structure. Maybe that's something I'll give a shot?

opengridcc / opengrid-dev

Water analysis Pie_chart_analysis v0.7 (in WIPScripts) #23