mars-project / mars

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
https://mars-project.readthedocs.io
Apache License 2.0
2.68k stars 325 forks source link

[BUG] Apply Function get HTTP Error #3338

Closed Suphx closed 1 year ago

Suphx commented 1 year ago

Describe the bug image I wrote a function that want to enable a bert model to do embedding inference to a certain column(df.behavior_ext2) in a mars.dataframe, but I got a HTTP Error when trying to execute the compution on dataframe.

To Reproduce To help us reproducing this bug, please provide information below:

  1. Your Python version -- python3.9
  2. The version of Mars you use pymars 0.8.7
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

HTTPClientError Traceback (most recent call last) Cell In[34], line 1 ----> 1 df_1.execute()

File E:\Miniconda3\envs\d2l\lib\site-packages\mars\core\entity\tileables.py:463, in HasShapeTileable.execute(self, session, kw) 462 def execute(self, session=None, kw): --> 463 result = self.data.execute(session=session, **kw) 464 if isinstance(result, TILEABLE_TYPE): 465 return self

File E:\Miniconda3\envs\d2l\lib\site-packages\mars\core\entity\executable.py:138, in _ExecutableMixin.execute(self, session, kw) 135 from ...deploy.oscar.session import execute 137 session = _get_session(self, session) --> 138 return execute(self, session=session, kw)

File E:\Miniconda3\envs\d2l\lib\site-packages\mars\deploy\oscar\session.py:1803, in execute(tileable, session, wait, new_session_kwargs, show_progress, progress_update_interval, tileables, kwargs) 1801 session = get_default_or_create((new_session_kwargs or dict())) 1802 session = _ensure_sync(session) -> 1803 return session.execute( 1804 tileable, 1805 tileables, 1806 wait=wait, 1807 show_progress=show_progress, 1808 progress_update_interval=progress_update_interval, 1809 **kwargs, 1810 )

File E:\Miniconda3\envs\d2l\lib\site-packages\mars\deploy\oscar\session.py:1598, in SyncSession.execute(self, tileable, show_progress, warn_duplicated_execution, *tileables, **kwargs) 1596 fut = asyncio.run_coroutine_threadsafe(coro, self._loop) 1597 try: -> 1598 execution_info: ExecutionInfo = fut.result( 1599 timeout=self._isolated_session.timeout 1600 ) 1601 except KeyboardInterrupt: # pragma: no cover 1602 logger.warning("Cancelling running task")

File E:\Miniconda3\envs\d2l\lib\concurrent\futures_base.py:446, in Future.result(self, timeout) 444 raise CancelledError() 445 elif self._state == FINISHED: --> 446 return self.__get_result() 447 else: 448 raise TimeoutError()

File E:\Miniconda3\envs\d2l\lib\concurrent\futures_base.py:391, in Future.__get_result(self) 389 if self._exception: 390 try: --> 391 raise self._exception 392 finally: 393 # Break a reference cycle with the exception in self._exception 394 self = None

File E:\Miniconda3\envs\d2l\lib\site-packages\mars\deploy\oscar\session.py:1755, in _execute(session, wait, show_progress, progress_update_interval, cancelled, *tileables, kwargs) 1746 async def _execute( 1747 *tileables: Tuple[TileableType], 1748 session: _IsolatedSession = None, (...) 1753 *kwargs, 1754 ): -> 1755 execution_info = await session.execute(tileables, kwargs) 1757 def _attach_session(future: asyncio.Future): 1758 if future.exception() is None:

File E:\Miniconda3\envs\d2l\lib\site-packages\mars\deploy\oscar\session.py:943, in _IsolatedSession.execute(self, *tileables, **kwargs) 938 tileable_graph, to_execute_tileables = gen_submit_tileable_graph( 939 self, tileables, warn_duplicated_execution=warn_duplicated_execution 940 ) 942 # submit task --> 943 task_id = await self._task_api.submit_tileable_graph( 944 tileable_graph, 945 task_name=task_name, 946 fuse_enabled=fuse_enabled, 947 extra_config=extra_config, 948 ) 950 progress = Progress() 951 # create asyncio.Task

File E:\Miniconda3\envs\d2l\lib\site-packages\mars\services\task\api\web.py:211, in WebTaskAPI.submit_tileable_graph(self, graph, task_name, fuse_enabled, extra_config) 200 extra_config_ser = ( 201 serialize_serializable(extra_config) if extra_config else None 202 ) 203 body = serialize_serializable( 204 { 205 "task_name": task_name if task_name else "", (...) 209 } 210 ) --> 211 res = await self._request_url( 212 path=path, 213 method="POST", 214 headers={"Content-Type": "application/octet-stream"}, 215 data=body, 216 ) 217 return res.body.decode().strip()

File E:\Miniconda3\envs\d2l\lib\site-packages\mars\services\web\core.py:247, in MarsWebAPIClientMixin._request_url(self, method, path, **kwargs) 244 pass 246 if exc is None: --> 247 raise res.error 248 else: 249 raise exc.with_traceback(tb)

HTTPClientError: HTTP 400: Bad Request

Expected behavior

Additional context

Suphx commented 1 year ago

image Loading a dataframe from a pandas.dataframe is Ok, the Mars Cluster seems not to crash or fail.

fyrestone commented 1 year ago

How did you initialize the Mars session? The error came from the Mars web session. Could you try to initialize the Mars session by mars.new_session(backend='ray') or mars.new_ray_session(backend='ray')? Ray execution backend does not use the Mars web session at all.

fyrestone commented 1 year ago

If you want use Mars on Ray in cluster mode. The following simple steps should work:

qinxuye commented 1 year ago

@fyrestone Does the user mention that he is using ray backend?

fyrestone commented 1 year ago

@fyrestone Does the user mention that he is using ray backend?

No. But the ray backend is easier to use and don't have this problem at all, why not give it a try?

qinxuye commented 1 year ago

Is this issue addressed @Suphx ? I noticed that you have closed it.