nv-morpheus / Morpheus

Morpheus SDK
Apache License 2.0
348 stars 130 forks source link

[BUG]: Error with Azure DFP examples when using `multiprocess` download method #1145

Closed efajardo-nv closed 1 year ago

efajardo-nv commented 1 year ago

Version

23.11

Which installation method(s) does this occur on?

Docker

Describe the bug.

Azure DFP example training and inference pipelines fail during DFPFileToDataFrameStage when using multiprocess download method. Error is not seen with Duo examples.

Minimum reproducible example

  1. Follow instructions here to build DFP container.

  2. Create bash shell in container:

    docker compose run morpheus_pipeline bash
  3. Download Azure example data:

    python /workspace/examples/digital_fingerprinting/fetch_example_data.py azure
  4. Set to multiprocess download method

    export MORPHEUS_FILE_DOWNLOAD_TYPE=multiprocess
  5. Ensure there is no cached download data:

    cd /workspace/examples/digital_fingerprinting/production/morpheus
rm -rf .cache
  1. Run Azure DFP training example pipeline:
    python dfp_azure_pipeline.py --train_users generic --start_time "2022-08-01" --input_file="../../../data/dfp/azure-training-data/AZUREAD_2022*.json"

Relevant log output

Traceback (most recent call last):
  File "/workspace/examples/digital_fingerprinting/production/morpheus/dfp_azure_pipeline.py", line 354, in <module>
    run_pipeline(obj={}, auto_envvar_prefix='DFP', show_default=True, prog_name="dfp")
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/workspace/examples/digital_fingerprinting/production/morpheus/dfp_azure_pipeline.py", line 349, in run_pipeline
    pipeline.run()
  File "/workspace/morpheus/pipeline/pipeline.py", line 598, in run
    asyncio.run(self.run_async())
  File "/opt/conda/envs/morpheus/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/envs/morpheus/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/workspace/morpheus/pipeline/pipeline.py", line 576, in run_async
    await self.join()
  File "/workspace/morpheus/pipeline/pipeline.py", line 327, in join
    await self._mrc_executor.join_async()
  File "/workspace/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_file_to_df.py", line 204, in convert_to_dataframe
    output_df, cache_hit = self._get_or_create_dataframe_from_batch(fsspec_batch)
  File "/workspace/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_file_to_df.py", line 166, in _get_or_create_dataframe_from_batch
    dfs = self._downloader.download(download_buckets, download_method)
  File "/workspace/morpheus/utils/downloader.py", line 165, in download
    dfs = pool.map(download_fn, download_buckets)
  File "/opt/conda/envs/morpheus/lib/python3.10/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/conda/envs/morpheus/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/opt/conda/envs/morpheus/lib/python3.10/multiprocessing/pool.py", line 540, in _handle_tasks
    put(task)
  File "/opt/conda/envs/morpheus/lib/python3.10/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/opt/conda/envs/morpheus/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object '<lambda>.<locals>.<lambda>'

Full env printout

No response

Other/Misc.

No response

Code of Conduct

mdemoret-nv commented 1 year ago

I believe that we want to deprecate the multi-process download in the future

mdemoret-nv commented 1 year ago

To fix this issue, we should: