Open jpzhangvincent opened 2 years ago
Another side question, if I want to use docker, how do run the indexing step before "serving" which seems to be automatically triggered when we run "docker compose"?
When I wrote this I intended the user to index first on bare metal, then run the serving via docker-compose. Otherwise every time you docker-compose up
it would go through all the indexing again. By doing it on bare metal it becomes a one-time thing
Is there any way you can tell which Executor is causing the error? Perhaps commenting them out of the Flow one by one until the error disappears/appears?
There might be some stuff going with the Hubble backend which is outside the scope of this issue (or the code might need to be adapted accordingly)
Would it help if I re-built the fashion search example in a notebook? We're planning on pushing out more Jina notebooks anyway, so I'd like to do one that helps out
Is there any way you can tell which Executor is causing the error? Perhaps commenting them out of the Flow one by one until the error disappears/appears?
There might be some stuff going with the Hubble backend which is outside the scope of this issue (or the code might need to be adapted accordingly)
Oh ya I found I don't have the "annlite" installed but it seems it's a pain to install the package in Mac M1 notebook... - https://github.com/jina-ai/annlite/issues/116
If I change jinahub://PQLiteIndexer
to jinahub+docker://PQLiteIndexer
, would it work?
Another side question, if I want to use docker, how do run the indexing step before "serving" which seems to be automatically triggered when we run "docker compose"?
When I wrote this I intended the user to index first on bare metal, then run the serving via docker-compose. Otherwise every time you
docker-compose up
it would go through all the indexing again. By doing it on bare metal it becomes a one-time thing
Can you elaborate on why it would only need to index once on bare metal first if using the serving via docker-compose later?
Would it help if I re-built the fashion search example in a notebook? We're planning on pushing out more Jina notebooks anyway, so I'd like to do one that helps out
I think there's already a notebook example and youtube video about this example. But I'm more curious to learn more about the server/client design pattern and best practices to understand the deployment and operation sides of things. I would appreciate if you can validate this example again to see whether you can reproduce with the latest version of jina.
Can you elaborate on why it would only need to index once on bare metal first if using the serving via docker-compose later? This is because a Dockerfile can only have one command as an endpoint. And the Dockerfile in
backend/
has to call eitherpython app.py -t index
ORpython app.py -t serve
. I had to choose one.
docker-compose up
attaches the pre-existing index directory (indexed on bare metal) as a volume for the backend container. Admittedly it's not the most elegant solution, so I'll see what I can do.
I just pushed a fix for the failed requirements installation. I'm testing it on my end now
I'm seeing other errors now. Time for a game of bug whack-a-mole. I'll keep you posted on progress
I was able to use the "SimpleIndexer" to get my indexing step running for my project. But I sometimes got errors during the indexing step. Is there a way to get how many docs have been indexed and persisted in the db? so that I don't need to run the "indexing" for all the docs over again.
Is the "client.update()" supposed to support this case if we don't want re-indexing everything? What's the logic behind it?
I'm seeing other errors now. Time for a game of bug whack-a-mole. I'll keep you posted on progress
Those other problems seem to have been fixed now. Running python app.py -t serve
doesn't crash anymore
I was able to use the "SimpleIndexer" to get my indexing step running for my project. But I sometimes got errors during the indexing step. Is there a way to get how many docs have been indexed and persisted in the db? so that I don't need to run the "indexing" for all the docs over again.
What errors did you get?
Is the "client.update()" supposed to support this case if we don't want re-indexing everything? What's the logic behind it? This is useful for if you're hosting your Flow on JCloud and sending index/update requests via the gateway. The code for indexing locally doesn't use this, it just calls the Flow directly.
I really realize I need to give this an overhaul and maybe a refactor, Let me know any other issues you're having and I'll take them into account
@alexcg1 Just curious, did you get a chance to validate whether the docker compose
approach works on your local machine?
It still didn't work on my end with docker. Also, a small fix in the frontend/Dockerfile
is needed - I added 'RUN apt-get update && apt-get install gcc -y' so that it can install and build the streamlit library in docker.
I was able to use the "SimpleIndexer" to get my indexing step running for my project. But I sometimes got errors during the indexing step. Is there a way to get how many docs have been indexed and persisted in the db? so that I don't need to run the "indexing" for all the docs over again.
Is the "client.update()" supposed to support this case if we don't want re-indexing everything? What's the logic behind it?
If you're using SimpleIndexer it'll just store the index in a SQLIte database that you can browse with this tool
I've now got indexing working under docker-compose, and I'll push the changes later today (my network is having issues connecting to github right now)
Is there a way to get how many docs have been indexed and persisted in the db? so that I don't need to run the "indexing" for all the docs over again.
I wrote a simple tooled called jfc
that might help with that. You should be able to use it to query your Flow's status
endpoint which will report back number of indexed Documents.
I just pushed the changes
I'm facing another bug now, getting front end to display the images properly. But querying via cURL works fine.
To recap: you can now both index and search via docker-compose
Just let you know, I still got the error during the CLIP encoding step - "ERROR: No matching distribution found for torch==1.10.2+cpu" I think it's related to this issue - https://github.com/jina-ai/executor-clip-encoder/issues/8
To improve the reproducibility, would it be better to specify in the 'flow.yml' to use docker, like "jinahub+docker://CLIPEncoder"
But when I specified "jinahub+docker://CLIPEncoder", I got the error: Please run Docker daemon and try again
. Any ideas?
Can you try running docker run hello-world
? I've sometimes faced similar issues
And I'm not sure how feasible it would be to run jinahub+docker://...
stuff from docker-compose. I haven't tried it.
You're still running via docker-compose right? Not spinning up Docker images separately, or running on bare metal?
Not that it fixes the problem we're discussing, but I pushed a change that gets the frontend working consistently in docker-compose
You're still running via docker-compose right? Not spinning up Docker images separately, or running on bare metal?
Yes still using "docker compose", I realized that you can't run docker insider docker so "jinahub+docker". I did a little hack to copy the "requirements.txt" and "CLIPEncoder" class from the https://github.com/jina-ai/executor-clip-encoder to work around the package installation issue. I think basically the "torch" version needs to update to "1.12" version as the PR fix for the gpu version.
Ya but again got the frontend error. will try again with your fix. Thanks a lot for the help.
File "/usr/local/lib/python3.9/site-packages/streamlit/scriptrunner/script_runner.py", line 475, in _run_script
exec(code, module.__dict__)
File "/workspace/frontend.py", line 92, in <module>
matches = get_matches(
File "/workspace/helper.py", line 7, in get_matches
client = Client(host=server)
File "/usr/local/lib/python3.9/site-packages/jina/clients/__init__.py", line 74, in Client
args = parse_client(kwargs)
File "/usr/local/lib/python3.9/site-packages/jina/helper.py", line 1554, in parse_client
return ArgNamespace.kwargs2namespace(
File "/usr/local/lib/python3.9/site-packages/jina/helper.py", line 839, in kwargs2namespace
p_args, unknown_args = parser.parse_known_args(args)
File "/usr/local/lib/python3.9/argparse.py", line 1861, in parse_known_args
self.error(str(err))
File "/usr/local/lib/python3.9/argparse.py", line 2582, in error
self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
File "/usr/local/lib/python3.9/argparse.py", line 2569, in exit
_sys.exit(status)
I'm new to the jina ecosystem and would really like to reproduce this example. I had tried to both run locally and use docker. And both approaches gave me the same error:
File "...lib/python3.9/site-packages/jina/orchestrate/deployments/__init__.py", line 556, in start self.enter_context(self.shards[shard_id]) File "...lib/python3.9/contextlib.py", line 448, in enter_context result = _cm_type.__enter__(cm) File ".../lib/python3.9/site-packages/jina/orchestrate/deployments/__init__.py", line 224, in __enter__ self._pods.append(PodFactory.build_pod(_args).start()) File ".../lib/python3.9/site-packages/jina/orchestrate/pods/factory.py", line 35, in build_pod cargs.uses = HubIO(_hub_args).pull() File ".../lib/python3.9/site-packages/jina/hubble/hubio.py", line 838, in pull install_package_dependencies( File ".../lib/python3.9/site-packages/jina/hubble/hubapi.py", line 182, in install_package_dependencies raise ModuleNotFoundError( ModuleNotFoundError: Dependencies listed in requirements.txt are not all installed locally, this Executor may not run as expect. To install dependencies, add
--install-requirementsor set
install_requirements = True``How do I go about debugging?
Another side question, if I want to use docker, how do run the indexing step before "serving" which seems to be automatically triggered when we run "docker compose"?