Open sylvain471 opened 1 year ago
Download the dataset using this command on your local machine.
wget -e robots=off --recursive --no-clobber --page-requisites \ --html-extension --convert-links --restrict-file-names=windows \ --domains docs.ray.io --no-parent --accept=html \ -P $EFS_DIR https://docs.ray.io/en/master/
Hi, I'm running into exact same issue. When running the command for
wget -e robots=off --recursive --no-clobber --page-requisites \ --html-extension --convert-links --restrict-file-names=windows \ --domains docs.ray.io --no-parent --accept=html \ -P $EFS_DIR https://docs.ray.io/en/master/
I'm getting same issue as https://github.com/ray-project/ray/issues/26320 so I had to set $EFS_DIR to ../data
instead of /mnt/shared_storage/ray-assistant-data
, because of this issue: https://github.com/ray-project/llm-applications/issues/100
Even with this workaround, I'm still getting issues with running the same line in notebook:
sections_ds.count()
{
"name": "RayTaskError(UserCodeException)",
"message": "ray::FlatMap(extract_sections)() (pid=41516, ip=127.0.0.1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/tmp/ray/session_2024-04-24_20-30-47_848902_41459/runtime_resources/working_dir_files/_ray_pkg_82dd1b31f4f4a613/rag/data.py\", line 26, in extract_sections
with open(record[\"path\"], \"r\", encoding=\"utf-8\") as html_file:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '../data/docs.ray.io/en/master/joblib.html'
The above exception was the direct cause of the following exception:
ray::FlatMap(extract_sections)() (pid=41516, ip=127.0.0.1)
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_operator.py\", line 419, in _map_task
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py\", line 392, in __call__
for data in iter:
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py\", line 134, in _udf_timed_iter
output = next(input)
^^^^^^^^^^^
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py\", line 216, in __call__
yield from self._row_fn(input, ctx)
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py\", line 264, in transform_fn
for out_row in fn(row):
^^^^^^^
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py\", line 127, in fn
_handle_debugger_exception(e)
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py\", line 143, in _handle_debugger_exception
raise UserCodeException() from e
ray.exceptions.UserCodeException",
"stack": "---------------------------------------------------------------------------
RayTaskError(UserCodeException) Traceback (most recent call last)
Cell In[25], line 3
1 # Extract sections
2 sections_ds = ds.flat_map(extract_sections)
----> 3 sections_ds.count()
File ~/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/dataset.py:2488, in Dataset.count(self)
2482 return meta_count
2484 get_num_rows = cached_remote_fn(_get_num_rows)
2486 return sum(
2487 ray.get(
-> 2488 [get_num_rows.remote(block) for block in self.get_internal_block_refs()]
2489 )
2490 )
File ~/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/dataset.py:4631, in Dataset.get_internal_block_refs(self)
4612 @ConsumptionAPI(pattern=\"Time complexity:\")
4613 @DeveloperAPI
4614 def get_internal_block_refs(self) -> List[ObjectRef[Block]]:
4615 \"\"\"Get a list of references to the underlying blocks of this dataset.
4616
4617 This function can be used for zero-copy access to the data. It blocks
(...)
4629 A list of references to this dataset's blocks.
4630 \"\"\"
-> 4631 blocks = self._plan.execute().get_blocks()
4632 self._synchronize_progress_bar()
4633 return blocks
File ~/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/exceptions.py:84, in omit_traceback_stdout.<locals>.handle_trace(*args, **kwargs)
80 logger.exception(
81 \"Full stack trace:\", exc_info=True, extra={\"hide\": not log_to_stdout}
82 )
83 if is_user_code_exception:
---> 84 raise e.with_traceback(None)
85 else:
86 raise e.with_traceback(None) from SystemException()
RayTaskError(UserCodeException): ray::FlatMap(extract_sections)() (pid=41516, ip=127.0.0.1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File \"/tmp/ray/session_2024-04-24_20-30-47_848902_41459/runtime_resources/working_dir_files/_ray_pkg_82dd1b31f4f4a613/rag/data.py\", line 26, in extract_sections
with open(record[\"path\"], \"r\", encoding=\"utf-8\") as html_file:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '../data/docs.ray.io/en/master/joblib.html'
The above exception was the direct cause of the following exception:
ray::FlatMap(extract_sections)() (pid=41516, ip=127.0.0.1)
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_operator.py\", line 419, in _map_task
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py\", line 392, in __call__
for data in iter:
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py\", line 134, in _udf_timed_iter
output = next(input)
^^^^^^^^^^^
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py\", line 216, in __call__
yield from self._row_fn(input, ctx)
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py\", line 264, in transform_fn
for out_row in fn(row):
^^^^^^^
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py\", line 127, in fn
_handle_debugger_exception(e)
File \"/Users/rossdancraig/.pyenv/versions/3.11.6/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py\", line 143, in _handle_debugger_exception
raise UserCodeException() from e
ray.exceptions.UserCodeException"
}
Hello, very interested with this work I am trying to run it locally.
However I am stuck at the cell
sections_ds.count()
throws the following error, any idea about what may solve this issue?