kippnorcal / google_classroom

Google Classroom Data Pipeline
GNU General Public License v3.0
22 stars 9 forks source link

Fix OrgUnit ValueError bug #28

Closed dchess closed 4 years ago

dchess commented 4 years ago

https://github.com/kipp-bayarea/google_classroom/blob/e8e40c5ae5891adf1eb94ef1b717f9a6a0d91b9c/api.py#L169-L171

@zkagin I am encountering an issue running the latest branch (traceback below)

Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 394, in call
    result = fn(*args, **kwargs)
  File "/code/timer.py", line 22, in wrapper
    results = func(*args, **kwargs)
  File "/code/api.py", line 104, in get_and_write_to_db
    results = self.request_data().execute()
  File "/code/api.py", line 169, in request_data
    if self.org_unit_id:
  File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/pandas/core/generic.py", line 1479, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main.py", line 159, in <module>
    main(Config)
  File "main.py", line 97, in main
    sql, overwrite=False, debug=config.DEBUG
  File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 311, in wrapped_f
    return self.call(f, *args, **kw)
  File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 391, in call
    do = self.iter(retry_state=retry_state)
  File "/root/.local/share/virtualenvs/code-_Py8Si6I/lib/python3.7/site-packages/tenacity/__init__.py", line 351, in iter
    six.raise_from(retry_exc, fut.exception())
  File "<string>", line 3, in raise_from
tenacity.RetryError: RetryError[<Future at 0x7f1a4e5b4910 state=finished raised ValueError>]
dchess commented 4 years ago

@zkagin It looks like the shape of the orgUnit dataframe is pivoted and the row index is now the column header. I was able to get this work by changing it to:

ou_id = None if result.empty else result[1].iloc[0]

But that is pretty obtuse. I'm wondering if this is related the the df.reindex() function that is being used now.

zkagin commented 4 years ago

I think it's the filter_data line in OrgUnits causing the issue, I'm looking into it now.

I figured out how to create Org Units and was able to repro the bug, so hopefully will have a fix shortly.

zkagin commented 4 years ago

Nope, wasn't that: the pivoting is happening at the line all_data = all_data.append(df).

Looking at why that's the case and what we can do instead now.

zkagin commented 4 years ago

https://github.com/kipp-bayarea/google_classroom/pull/30

This is a quick fix for now. I need to look a little bit more closely at pandas append behavior to fully understand why this is happening and whether there is a better solution long-term.