jina-ai / jina

☁️ Build multimodal AI applications with cloud-native stack
https://docs.jina.ai
Apache License 2.0
20.99k stars 2.22k forks source link

Notebook: crash first time: FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.7/dist-packages/requests-2.23.0.dist-info/METADATA' #5159

Closed alexcg1 closed 2 years ago

alexcg1 commented 2 years ago

Describe the bug Using this notebook, I go to runtime -> run all (for first run of notebook) it crashes. This is a common problem when I run Jina in notebooks. It may (or may not) be related to AnnLiteIndexer? (from what I remember of previous debugging attempts)

When I come to the cell with:

with flow:
  docs = flow.index(docs, show_progress=True)

It crashes with the following error:

AttributeError                            Traceback (most recent call last)
[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in _dep_map(self)
   3015         try:
-> 3016             return self.__dep_map
   3017         except AttributeError:

28 frames
[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in __getattr__(self, attr)
   2812         if attr.startswith('_'):
-> 2813             raise AttributeError(attr)
   2814         return getattr(self._provider, attr)

AttributeError: _DistInfoDistribution__dep_map

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in _parsed_pkg_info(self)
   3006         try:
-> 3007             return self._pkg_info
   3008         except AttributeError:

[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in __getattr__(self, attr)
   2812         if attr.startswith('_'):
-> 2813             raise AttributeError(attr)
   2814         return getattr(self._provider, attr)

AttributeError: _pkg_info

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
[<ipython-input-15-8e84330be270>](https://localhost:8080/#) in <module>
----> 1 with flow:
      2   docs = flow.index(docs, show_progress=True)

[/usr/local/lib/python3.7/dist-packages/jina/orchestrate/flow/base.py](https://localhost:8080/#) in __enter__(self)
   1414     def __enter__(self):
   1415         with CatchAllCleanupContextManager(self):
-> 1416             return self.start()
   1417 
   1418     def __exit__(self, exc_type, exc_val, exc_tb):

[/usr/local/lib/python3.7/dist-packages/jina/orchestrate/flow/builder.py](https://localhost:8080/#) in arg_wrapper(self, *args, **kwargs)
     31             if hasattr(self, '_build_level'):
     32                 if self._build_level in levels:
---> 33                     return func(self, *args, **kwargs)
     34                 else:
     35                     raise FlowBuildLevelError(

[/usr/local/lib/python3.7/dist-packages/jina/orchestrate/flow/base.py](https://localhost:8080/#) in start(self)
   1468         for k, v in self:
   1469             if not v.external:
-> 1470                 self.enter_context(v)
   1471 
   1472         self._wait_until_all_ready()

[/usr/lib/python3.7/contextlib.py](https://localhost:8080/#) in enter_context(self, cm)
    425         _cm_type = type(cm)
    426         _exit = _cm_type.__exit__
--> 427         result = _cm_type.__enter__(cm)
    428         self._push_cm_exit(cm, _exit)
    429         return result

[/usr/local/lib/python3.7/dist-packages/jina/orchestrate/deployments/__init__.py](https://localhost:8080/#) in __enter__(self)
    111     def __enter__(self) -> 'BaseDeployment':
    112         with CatchAllCleanupContextManager(self):
--> 113             return self.start()
    114 
    115     @staticmethod

[/usr/local/lib/python3.7/dist-packages/jina/orchestrate/deployments/__init__.py](https://localhost:8080/#) in start(self)
    554                 self.head_pod,
    555             )
--> 556             self.enter_context(self.shards[shard_id])
    557 
    558         return self

[/usr/lib/python3.7/contextlib.py](https://localhost:8080/#) in enter_context(self, cm)
    425         _cm_type = type(cm)
    426         _exit = _cm_type.__exit__
--> 427         result = _cm_type.__enter__(cm)
    428         self._push_cm_exit(cm, _exit)
    429         return result

[/usr/local/lib/python3.7/dist-packages/jina/orchestrate/deployments/__init__.py](https://localhost:8080/#) in __enter__(self)
    222                 if getattr(self.deployment_args, 'noblock_on_start', False):
    223                     _args.noblock_on_start = True
--> 224                 self._pods.append(PodFactory.build_pod(_args).start())
    225             return self
    226 

[/usr/local/lib/python3.7/dist-packages/jina/orchestrate/pods/factory.py](https://localhost:8080/#) in build_pod(args)
     42             return ContainerPod(cargs)
     43         else:
---> 44             return Pod(args)

[/usr/local/lib/python3.7/dist-packages/jina/orchestrate/pods/__init__.py](https://localhost:8080/#) in __init__(self, args)
    303     def __init__(self, args: 'argparse.Namespace'):
    304         super().__init__(args)
--> 305         self.runtime_cls = self._get_runtime_cls()
    306         self.worker = multiprocessing.Process(
    307             target=run,

[/usr/local/lib/python3.7/dist-packages/jina/orchestrate/pods/__init__.py](https://localhost:8080/#) in _get_runtime_cls(self)
    355         from jina.serve.runtimes import get_runtime
    356 
--> 357         update_runtime_cls(self.args)
    358         return get_runtime(self.args.runtime_cls)

[/usr/local/lib/python3.7/dist-packages/jina/orchestrate/pods/helper.py](https://localhost:8080/#) in update_runtime_cls(args, copy)
     83         _hub_args.uri = _args.uses
     84         _hub_args.no_usage = True
---> 85         _args.uses = HubIO(_hub_args).pull()
     86 
     87     if hasattr(_args, 'protocol'):

[/usr/local/lib/python3.7/dist-packages/jina/hubble/hubio.py](https://localhost:8080/#) in pull(self)
    975                                 cached_zip_file,
    976                                 executor,
--> 977                                 install_deps=self.args.install_requirements,
    978                             )
    979 

[/usr/local/lib/python3.7/dist-packages/jina/hubble/hubapi.py](https://localhost:8080/#) in install_local(zip_package, executor, install_deps)
    169     pkg_dist_path.mkdir(parents=False, exist_ok=True)
    170 
--> 171     install_package_dependencies(install_deps, pkg_dist_path, pkg_path)
    172 
    173     manifest_path = pkg_path / 'manifest.yml'

[/usr/local/lib/python3.7/dist-packages/jina/hubble/hubapi.py](https://localhost:8080/#) in install_package_dependencies(install_deps, pkg_dist_path, pkg_path)
    198 
    199         if install_deps:
--> 200             install_requirements(requirements_file)
    201         elif not is_requirements_installed(requirements_file, show_warning=True):
    202             raise ModuleNotFoundError(

[/usr/local/lib/python3.7/dist-packages/jina/hubble/helper.py](https://localhost:8080/#) in install_requirements(requirements_file, timeout)
    493     """
    494 
--> 495     if is_requirements_installed(requirements_file):
    496         return
    497 

[/usr/local/lib/python3.7/dist-packages/jina/hubble/helper.py](https://localhost:8080/#) in is_requirements_installed(requirements_file, show_warning)
    414 
    415     try:
--> 416         pkg_resources.require('\n'.join(install_reqs))
    417     except (DistributionNotFound, VersionConflict, RequirementParseError) as ex:
    418         if show_warning:

[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in require(self, *requirements)
    884         included, even if they were already activated in this working set.
    885         """
--> 886         needed = self.resolve(parse_requirements(requirements))
    887 
    888         for dist in needed:

[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in resolve(self, requirements, env, installer, replace_conflicting, extras)
    778 
    779             # push the new requirements onto the stack
--> 780             new_requirements = dist.requires(req.extras)[::-1]
    781             requirements.extend(new_requirements)
    782 

[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in requires(self, extras)
   2732     def requires(self, extras=()):
   2733         """List of Requirements needed for this distro if `extras` are used"""
-> 2734         dm = self._dep_map
   2735         deps = []
   2736         deps.extend(dm.get(None, ()))

[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in _dep_map(self)
   3016             return self.__dep_map
   3017         except AttributeError:
-> 3018             self.__dep_map = self._compute_dependencies()
   3019             return self.__dep_map
   3020 

[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in _compute_dependencies(self)
   3025         reqs = []
   3026         # Including any condition expressions
-> 3027         for req in self._parsed_pkg_info.get_all('Requires-Dist') or []:
   3028             reqs.extend(parse_requirements(req))
   3029 

[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in _parsed_pkg_info(self)
   3007             return self._pkg_info
   3008         except AttributeError:
-> 3009             metadata = self.get_metadata(self.PKG_INFO)
   3010             self._pkg_info = email.parser.Parser().parsestr(metadata)
   3011             return self._pkg_info

[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in get_metadata(self, name)
   1405             return ""
   1406         path = self._get_metadata_path(name)
-> 1407         value = self._get(path)
   1408         try:
   1409             return value.decode('utf-8')

[/usr/local/lib/python3.7/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in _get(self, path)
   1609 
   1610     def _get(self, path):
-> 1611         with open(path, 'rb') as stream:
   1612             return stream.read()
   1613 

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.7/dist-packages/requests-2.23.0.dist-info/METADATA'

Describe how you solve it Runtime -> Restart and run all


Environment

- jina 3.9.1
- docarray 0.16.3
- jcloud 0.0.35
- jina-hubble-sdk 0.16.1
- jina-proto 0.1.13
- protobuf 3.17.3
- proto-backend cpp
- grpcio 1.47.0
- pyyaml 6.0
- python 3.7.13
- platform Linux
- platform-release 5.10.133+
- platform-version #1 SMP Fri Aug 26 08:44:51 UTC 2022
- architecture x86_64
- processor x86_64
- uid 2485378613250
- session-id 9c3497dc-3028-11ed-a630-0242ac1c0002
- uptime 2022-09-09T10:17:28.944030
- ci-vendor (unset)
- internal False
* JINA_DEFAULT_HOST (unset)
* JINA_DEFAULT_TIMEOUT_CTRL (unset)
* JINA_DEPLOYMENT_NAME (unset)
* JINA_DISABLE_UVLOOP (unset)
* JINA_EARLY_STOP (unset)
* JINA_FULL_CLI (unset)
* JINA_GATEWAY_IMAGE (unset)
* JINA_GRPC_RECV_BYTES (unset)
* JINA_GRPC_SEND_BYTES (unset)
* JINA_HUB_NO_IMAGE_REBUILD (unset)
* JINA_LOG_CONFIG (unset)
* JINA_LOG_LEVEL (unset)
* JINA_LOG_NO_COLOR (unset)
* JINA_MP_START_METHOD (unset)
* JINA_OPTOUT_TELEMETRY 1
* JINA_RANDOM_PORT_MAX (unset)
* JINA_RANDOM_PORT_MIN (unset)

Screenshots

JoanFM commented 2 years ago

What is the flow, it definitely seems to be a problem with AnnLiteIndexer, can u do install_requirements TRUE?

alexcg1 commented 2 years ago

Flow YAML

It's got install_requirements = True

JoanFM commented 2 years ago

Can u try to see which Executor fails? Can u try to load the Flow with one Single Executor to see which one fails?

alexcg1 commented 2 years ago

Seems to be AnnLiteIndexer, at least some of the time

Run empty Flow - OK Run empty Flow AGAIN - OK Run Flow with just AnnLiteIndexer - FAIL Run Flow with just AnnLiteIndexer AGAIN - OK

I won't touch that notebook moving forwards, I'll leave it for you as a record of this bug

alexcg1 commented 2 years ago

I didn't keep going and test CLIPEncoder too. I want to keep it as a minimal not-working example

alexcg1 commented 2 years ago

It defo seems to be an Executor thing (either Executors in general or AnnLite), not a Flow thing

numb3r3 commented 2 years ago

@alexcg1 @JoanFM This issue seems due to pip install failed. This is the related issues:

And what's more, to support the new syntax of columns, it's better to pin the version of docarry>=0.16.4 and annlite>=0.3.11

JoanFM commented 2 years ago

So then @alexcg1 try updating pip. I agree this does not seem to be a jina issue.

alexcg1 commented 2 years ago

I've just upgraded pip and I'm getting a similar error. This time it's moaning about protobuf, not requests:

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.7/dist-packages/protobuf-3.17.3.dist-info/METADATA'

Notebook link

Should I file a new bug or is it a continuation of this one?

alexcg1 commented 2 years ago

pip install jina is giving following error (after running pip install -U pip)

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.8.2+zzzcolab20220719082949 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.21.6 which is incompatible.
tensorflow-metadata 1.10.0 requires protobuf<4,>=3.13, but you have protobuf 4.21.6 which is incompatible.
google-cloud-bigquery-storage 1.1.2 requires protobuf<4.0.0dev, but you have protobuf 4.21.6 which is incompatible.
google-api-core 1.31.6 requires protobuf<4.0.0dev,>=3.12.0; python_version > "3", but you have protobuf 4.21.6 which is incompatible.
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
alexcg1 commented 2 years ago

Huh, now the error has flipped back to mentioning requests instead of protobuf

alexcg1 commented 2 years ago

Check this notebook

It fails for AnnLiteIndexer specifically. Should I re-open and move this issue to the appropriate repo?

(Except the package has changed again to Pillow)

alexcg1 commented 2 years ago

It fails for AnnLiteIndexer specifically. Should I re-open and move this issue to the appropriate repo?

@numb3r3