src-d / vecino

Vecino is a command line application to discover Git repositories which are similar to the one that the user provides.
Other
46 stars 13 forks source link

vecino docker run seems failed before ENDING #8

Open billmetangmo opened 6 years ago

billmetangmo commented 6 years ago

Expected behavior

I expect that ouput of vecino docker similarities finding of Levis0045/MetaLex would be :

docker run -it --rm srcd/vecino https://github.com/Levis0045/MetaLex
                                    github-repo1    x.XX
                                    github-repo2    x.XX
                                    github-repo3    x.XX

Actual behavior

It seems to work fine at the beginning :

INFO:bblfsh:Detected bblfsh server: 172.17.0.1:9432
INFO:enry:Fetching https://api.github.com/repos/src-d/enry/releases/latest
INFO:enry:Latest release resolved to enry_v1.6.3_linux_amd64.tar.gz
INFO:enry:Fetching https://github.com/src-d/enry/releases/download/v1.6.3/enry_v1.6.3_linux_amd64.tar.gz
INFO:enry:Extracting the binary
INFO:enry:Downloaded /enry
INFO:gcs-backend:Fetching https://storage.googleapis.com/models.cdn.sourced.tech/index.json?ignoreCache=1...
INFO:gcs-backend:Fetching https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fid2vec%2F92609e70-f79c-46b5-8419-55726e873cfc.asdf...
[################################] 17044/17044 - 00:12:06
INFO:id2vec:Reading /root/.source{d}/id2vec/default.asdf...
INFO:id2vec:Building the token index...
INFO:similar_repos:Loaded id2vec model: {'created_at': datetime.datetime(2017, 6, 18, 17, 37, 6, 255615),
 'dependencies': [],
 'model': 'id2vec',
 'uuid': '92609e70-f79c-46b5-8419-55726e873cfc',
 'version': [1, 0, 0]}
Shape: (999424, 300)
First 10 words: ['get', 'name', 'type', 'string', 'class', 'set', 'data', 'value', 'self', 'test']
INFO:gcs-backend:Fetching https://storage.googleapis.com/models.cdn.sourced.tech/index.json?ignoreCache=1...
INFO:gcs-backend:Fetching https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fdocfreq%2Ff64bacd4-67fb-4c64-8382-399a8e7db52a.asdf...
[################################] 372/372 - 00:00:17
INFO:docfreq:Reading /root/.source{d}/docfreq/default.asdf...
INFO:docfreq:Building the docfreq dictionary...
INFO:docfreq:Pruning to min 20 occurrences
INFO:similar_repos:Loaded document frequencies: {'created_at': datetime.datetime(2017, 6, 19, 9, 59, 14, 766638),
 'dependencies': [],
 'model': 'docfreq',
 'uuid': 'f64bacd4-67fb-4c64-8382-399a8e7db52a',
 'version': [1, 0, 0]}
Number of words: 416370
First 10 words: ['aaa', 'aaaa', 'aaaaa', 'aaaaaa', 'aaaaaaa', 'aaaaaaaa', 'aaaaaaaaa', 'aaaaaaaaaa', 'aaaaaaaaaaa', 'aaaaaaaaaaaa']
Number of documents: 112273
INFO:gcs-backend:Fetching https://storage.googleapis.com/models.cdn.sourced.tech/index.json?ignoreCache=1...
INFO:gcs-backend:Fetching https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fnbow%2F1e3da42a-28b6-4b33-94a2-a5671f4102f4.asdf...
[################################] 5672/5672 - 00:05:20
INFO:nbow:Reading /root/.source{d}/nbow/default.asdf...
INFO:nbow:Building the repository names mapping...
INFO:similar_repos:Loaded nBOW model: {'created_at': datetime.datetime(2017, 6, 19, 9, 16, 8, 942880),
 'dependencies': [{'created_at': datetime.datetime(2017, 6, 18, 17, 37, 6, 255615),
                   'dependencies': [],
                   'model': 'id2vec',
                   'uuid': '92609e70-f79c-46b5-8419-55726e873cfc',
                   'version': [1, 0, 0]},
                  {'created_at': datetime.datetime(2017, 6, 19, 9, 59, 14, 766638),
                   'dependencies': [],
                   'model': 'docfreq',
                   'uuid': 'f64bacd4-67fb-4c64-8382-399a8e7db52a',
                   'version': [1, 0, 0]}],
 'model': 'nbow',
 'uuid': '1e3da42a-28b6-4b33-94a2-a5671f4102f4',
 'version': [1, 0, 0]}
Shape: (112273, 999424)
First 10 repos: ['ikizir/HohhaDynamicXOR', 'ditesh/node-poplib', 'Code52/MarkPadRT', 'wp-shortcake/shortcake', 'capaj/Moonridge', 'HugoGiraudel/hugogiraudel.github.com', 'crosswalk-project/crosswalk-website', 'apache/parquet-mr', 'dciccale/kimbo.js', 'processone/oneteam']
INFO:bblfsh:Detected bblfsh server: 172.17.0.1:9432
INFO:similar_repos:Creating the WMD engine...
INFO:repo_cloner:Cloning from https://github.com/Levis0045/MetaLex...
INFO:repo_cloner:Finished cloning https://github.com/Levis0045/MetaLex
INFO:repo_cloner:Classifying the files...
INFO:repo_cloner:Result: {'HTML': 1, 'CSS': 1, 'Shell': 1, 'Python': 20, 'Text': 5}
INFO:repo2nbow:Fetching and processing UASTs...

Then start to fail:

ERROR:repo2nbow:bblfsh: RpcError on /tmp/repo2-vb2b74e1/Levis0045&MetaLex_github.com/metalex/api.py: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, Connect Failed)>
WARNING:repo2nbow:/tmp/repo2-vb2b74e1/Levis0045&MetaLex_github.com/metalex/api.py was skipped
ERROR:repo2nbow:bblfsh: RpcError on /tmp/repo2-vb2b74e1/Levis0045&MetaLex_github.com/metalex/logs/__init__.py: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, Connect Failed)>
WARNING:repo2nbow:/tmp/repo2-vb2b74e1/Levis0045&MetaLex_github.com/metalex/logs/__init__.py was skipped
INFO:repo2nbow:https://github.com/Levis0045/MetaLex pending tasks: 19
.........
INFO:repo2nbow:https://github.com/Levis0045/MetaLex pending tasks: 0
Traceback (most recent call last):
  File "/usr/local/bin/vecino", line 11, in <module>
    load_entry_point('vecino==0.1.6a0', 'console_scripts', 'vecino')()
  File "/usr/local/lib/python3.5/dist-packages/vecino/__main__.py", line 76, in main
    max_time=args.max_time, skipped_stop=args.skipped_stop)
  File "/usr/local/lib/python3.5/dist-packages/vecino/similar_repositories.py", line 80, in query
    neighbours = self._query_foreign(url_or_path_or_name, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/vecino/similar_repositories.py", line 108, in _query_foreign
    return self._wmd.nearest_neighbors((words, weights), **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/wmd/__init__.py", line 507, in nearest_neighbors
    "Too little vocabulary for %s: %d" % (index, len(words)))
ValueError: Too little vocabulary for None: 0

Steps to reproduce the behavior

docker build -t srcd/vecino .
docker run -d --privileged -p 9432:9432 --name bblfshd bblfsh/bblfshd
docker exec -it bblfshd bblfshctl driver install --all
docker run -it --rm srcd/vecino https://github.com/Levis0045/MetaLex

Any advice ?

vmarkovtsev commented 6 years ago

Did every "pending tasks" message take a few seconds to appear? Looks like it cannot connect to Babelfish. The exception is the ends means nothing was extracted. Does this command work?

docker run -it --rm --entrypoint bash srcd/vecino
python3 -m bblfsh -f /usr/local/lib/python3.5/dist-packages/vecino/__main__.py
billmetangmo commented 6 years ago

No , the command doesn't work and I got the same error with addition of this line:

Failed to connect to the Docker daemon and ensure that the Babelfish server is running. Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

The connection with babelfish server is done through the network or a unix socket ? Because, i tought i will get an Connection aborted from NIC .