413 Client Error (Request Entity Too Large) for https://trace.wandb.ai/call/upsert_batch

ryanrudes commented 1 month ago

At the end of a fairly long run of Evaluation.evaluate (approx. 8000 examples), I get a peculiar error that does not appear for shorter runs.

Exception in thread Thread-1 (_process_batches):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.12/site-packages/weave/trace_server/async_batch_processor.py", line 57, in _process_batches
    self.processor_fn(current_batch)
  File "/lib/python3.12/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/lib/python3.12/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/weave/trace_server/remote_http_trace_server.py", line 166, in _flush_calls
    r.raise_for_status()
  File "/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 413 Client Error: Request Entity Too Large for url: https://trace.wandb.ai/call/upsert_batch
Evaluation summary
<omitted for conciseness>
Exception in thread Thread-1 (_process_batches):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/lib/python3.12/site-packages/weave/trace_server/async_batch_processor.py", line 57, in _process_batches
    self.processor_fn(current_batch)
  File "/lib/python3.12/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/lib/python3.12/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/weave/trace_server/remote_http_trace_server.py", line 166, in _flush_calls
    r.raise_for_status()
  File "/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 413 Client Error: Request Entity Too Large for url: https://trace.wandb.ai/call/upsert_batch

gtarpenning commented 1 month ago

Hi @ryanrudes thanks for reporting this. My apologies you ran into this issue. Do you mind giving us a bit more information about your environment? Specifically, the weave version you are using, as well as your wandb username. It is possible that this issue has already been resolved in master (here), but if you are logging large images or very large (30+MB) traces this notably does not account for that. Thanks!

ryanrudes commented 1 month ago

Hi, my trace is most definitely larger than 30 MB. I was not aware that that would be considered large. I am noting that these runs are also appearing as ongoing in the portal despite having finished. Also, although all of the data I uploaded appears in the web portal, some of it is missing when I export and download the tables.

I am using weave 0.50.13 and wandb 0.17.6. My wandb username is ryanrudes-dtxplus.

gtarpenning commented 1 month ago

@ryanrudes Both of those issues, large traces erroring and the export not containing the full data, are ticketed and under active development. They should be resolved in the near future; this thread will be updated when they are merged and deployed. Thanks for raising these.

Also, a new version of the weave SDK was released yesterday, so updating the weave package will include the changes I mentioned that might alleviate your upsert_batch issue.

wandb / weave

413 Client Error (Request Entity Too Large) for https://trace.wandb.ai/call/upsert_batch #2110