kippnorcal / google_classroom

Google Classroom Data Pipeline
GNU General Public License v3.0
23 stars 9 forks source link

StudentSubmissions Memory Error #105

Closed dchess closed 4 years ago

dchess commented 4 years ago

@zkagin Here's a traceback for that memory error I mentioned:

2020-09-26 12:09:40AM UTC | INFO: StudentSubmissions: Generating requests...
2020-09-26 12:09:46AM UTC | INFO: StudentSubmissions: 1923 requests remaining.
2020-09-26 12:09:57AM UTC | INFO: StudentSubmissions: 1630 requests remaining.
2020-09-26 12:10:25AM UTC | INFO: StudentSubmissions: 1382 requests remaining.
2020-09-26 12:10:36AM UTC | INFO: StudentSubmissions: Quota exceeded. Pausing for 20 seconds...
2020-09-26 12:10:56AM UTC | INFO: StudentSubmissions: 1287 requests remaining.
2020-09-26 12:11:30AM UTC | INFO: StudentSubmissions: 1101 requests remaining.
2020-09-26 12:12:01AM UTC | INFO: StudentSubmissions: 874 requests remaining.
2020-09-26 12:12:40AM UTC | INFO: StudentSubmissions: 671 requests remaining.
2020-09-26 12:14:42AM UTC | ERROR: RetryError[<Future at 0x7ff71ff4d250 state=finished raised MemoryError>]
Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/tenacity/__init__.py", line 412, in call
    result = fn(*args, **kwargs)
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/googleapiclient/http.py", line 1528, in execute
    self._execute(http, self._order, self._requests)
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/googleapiclient/http.py", line 1473, in _execute
    parser.feed(for_parser)
  File "/usr/local/lib/python3.7/email/feedparser.py", line 175, in feed
    self._input.push(data)
  File "/usr/local/lib/python3.7/email/feedparser.py", line 110, in push
    parts = self._partial.readlines()
MemoryError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main.py", line 167, in <module>
    main(Config)
  File "main.py", line 158, in main
    StudentSubmissions(classroom_service, sql, config).batch_pull_data(course_ids)
  File "/google_classroom/timer.py", line 22, in wrapper
    results = func(*args, **kwargs)
  File "/google_classroom/api.py", line 259, in batch_pull_data
    self._execute_batch_with_retry(batch)
  File "/google_classroom/api.py", line 161, in _execute_batch_with_retry
    retryer(batch.execute)
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/tenacity/__init__.py", line 409, in call
    do = self.iter(retry_state=retry_state)
  File "/root/.local/share/virtualenvs/-x-v5uFv0/lib/python3.7/site-packages/tenacity/__init__.py", line 369, in iter
    six.raise_from(retry_exc, fut.exception())
  File "<string>", line 3, in raise_from
tenacity.RetryError: RetryError[<Future at 0x7ff71ff4d250 state=finished raised MemoryError>]
dchess commented 4 years ago

@zkagin When I run this locally, I don't get the memory error, it just fails silently.

2020-09-26 12:38:44AM UTC | INFO: StudentSubmissions: Generating requests...
2020-09-26 12:38:54AM UTC | INFO: StudentSubmissions: 1923 requests remaining.
2020-09-26 12:39:10AM UTC | INFO: StudentSubmissions: 1630 requests remaining.
2020-09-26 12:39:59AM UTC | INFO: StudentSubmissions: 1382 requests remaining.

With debug it does much the same thing with no further clue.

zkagin commented 4 years ago

Interesting. There's not much that gets retained between batches, so perhaps the data coming back from StudentSubmissions is overwhelming the RAM provided on your server or VM? I'll try and repro this by limiting available memory as well. Does it happen only for StudentSubmission?

In the mean time, one hypothesis for a fix would be to lower the batch size on StudentSubmissions. If you are still getting this silent crash after that, then maybe there is something else going on.

dchess commented 4 years ago

@zkagin I tried reducing the batch size and that worked for a little while but we eventually encountered the error again at the smaller batch size. I've doubled the RAM on the server and that worked so far.