Preprocessing problem - Githubissues

mellahysf commented 4 years ago

Hi,

When I run the preprocessing script (python run.py preprocess experiments/spider-glove-run.jsonnet), it gives me the following error !!

WARNING <class 'ratsql.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} DB connections: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:25<00:00, 6.48it/s] train section: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [1:17:47<00:00, 1.86it/s] DB connections: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [09:33<00:00, 3.45s/it] val section: 20%|█████████████████████▊ | 211/1034 [3:10:35<16:09:18, 70.67s/it] val section: 63%|████████████████████████████████████████████████████████████████████▏ | 653/1034 [7:03:05<5:04:30, 47.95s/it] Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 416, in _make_request httplib_response = conn.getresponse() File "/opt/conda/lib/python3.7/http/client.py", line 1344, in getresponse response.begin() File "/opt/conda/lib/python3.7/http/client.py", line 306, in begin version, status, reason = self._read_status() File "/opt/conda/lib/python3.7/http/client.py", line 267, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/opt/conda/lib/python3.7/socket.py", line 589, in readinto return self._sock.recv_into(b) socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/requests/adapters.py", line 449, in send timeout=timeout File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/opt/conda/lib/python3.7/site-packages/urllib3/util/retry.py", line 400, in increment raise six.reraise(type(error), error, _stacktrace) File "/opt/conda/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise raise value File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen chunked=chunked, File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 423, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 331, in _raise_timeout self, url, "Read timed out. (read timeout=%s)" % timeout_value urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=9000): Read timed out. (read timeout=30.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "run.py", line 109, in main() File "run.py", line 73, in main preprocess.main(preprocess_config) File "/app/ratsql/commands/preprocess.py", line 53, in main preprocessor.preprocess() File "/app/ratsql/commands/preprocess.py", line 34, in preprocess self.model_preproc.add_item(item, section, validation_info) File "/app/ratsql/models/enc_dec.py", line 43, in add_item self.enc_preproc.add_item(item, section, enc_info) File "/app/ratsql/models/spider/spider_enc.py", line 168, in add_item preprocessed = self.preprocess_item(item, validation_info) File "/app/ratsql/models/spider/spider_enc.py", line 193, in preprocess_item question, question_for_copying = self._tokenize_for_copying(item.text, item.orig['question']) File "/app/ratsql/models/spider/spider_enc.py", line 239, in _tokenize_for_copying return self.word_emb.tokenize_for_copying(unsplit) File "/app/ratsql/resources/pretrained_embeddings.py", line 67, in tokenize_for_copying ann = corenlp.annotate(text, self.corenlp_annotators) File "/app/ratsql/resources/corenlp.py", line 46, in annotate return _singleton.annotate(text, annotators, output_format, properties) File "/app/ratsql/resources/corenlp.py", line 28, in annotate result = self.client.annotate(text, annotators, output_format, properties) File "/root/.local/lib/python3.7/site-packages/corenlp/client.py", line 225, in annotate r = self._request(text.encode('utf-8'), properties) File "/root/.local/lib/python3.7/site-packages/corenlp/client.py", line 192, in _request timeout=(self.timeout*2)/1000) File "/opt/conda/lib/python3.7/site-packages/requests/api.py", line 116, in post return request('post', url, data=data, json=json, kwargs) File "/opt/conda/lib/python3.7/site-packages/requests/api.py", line 60, in request return session.request(method=method, url=url, kwargs) File "/opt/conda/lib/python3.7/site-packages/requests/sessions.py", line 533, in request resp = self.send(prep, send_kwargs) File "/opt/conda/lib/python3.7/site-packages/requests/sessions.py", line 646, in send r = adapter.send(request, kwargs) File "/opt/conda/lib/python3.7/site-packages/requests/adapters.py", line 529, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=9000): Read timed out. (read timeout=30.0) val section: 63%|████████████████████████████████████████████████████████████████████▏ | 653/1034 [7:10:21<4:11:05, 39.54s/it] Exception ignored in: <function CoreNLP.del at 0x7f2f490c8320> Traceback (most recent call last): File "/app/ratsql/resources/corenlp.py", line 24, in del File "/root/.local/lib/python3.7/site-packages/corenlp/client.py", line 83, in stop File "/opt/conda/lib/python3.7/subprocess.py", line 1790, in kill AttributeError: 'NoneType' object has no attribute 'SIGKILL'

mellahysf commented 4 years ago

It's a memory error !! I increase the memory in docker with 6GB and it works now.

Akshaysharma29 commented 4 years ago

Hi, @mellahysf I have increased its memory to 8gb but it doesn't work. Does it require Cuda as well? any suggestion?

mellahysf commented 3 years ago

Hi @Akshaysharma29

No, it does not require Cuda, I run it just with CPU and 8G of memory and it works for me.

vishwa702 commented 1 year ago

I ran into the same problem even with about 8GB of memory

microsoft / rat-sql

Preprocessing problem #21