Closed saravananpsg closed 5 years ago
I encountered some problems running nlp-embeddings-document-doc2vec.ipynb 1) SSLCertVerificationError Traceback (most recent call last) /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args) 1316 h.request(req.get_method(), req.selector, req.data, headers, -> 1317 encode_chunked=req.has_header('Transfer-encoding')) 1318 except OSError as err: # timeout error
2) NameError Traceback (most recent call last)
I encountered some problems running nlp-embeddings-document-doc2vec.ipynb
- SSLCertVerificationError Traceback (most recent call last) /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py in do_open(self, http_class, req, **http_conn_args) 1316 h.request(req.get_method(), req.selector, req.data, headers, -> 1317 encode_chunked=req.has_header('Transfer-encoding')) 1318 except OSError as err: # timeout error
- NameError Traceback (most recent call last) in 1 doc2vec_embs = Doc2VecEmbeddings() ----> 2 x_train_tokens = doc2vec_embs.build_vocab(documents=x_train) 3 doc2vec_embs.train(x_train_tokens)
NameError: name 'x_train' is not defined
From my notebook,
x_train is defined as
x_train, x_val, y_train, y_val = train_test_split(np.array(train_raw_df.data), train_raw_df.target, test_size=0.1)
@makcedward I am trying to retrieve similar documents from the given document. Here is the code snippet:
x_train_t = doc2vec_embs.encode(documents=x_train) x_test_t = doc2vec_embs.encode(documents=x_test)
def similiar_docs(doc2vec_embs, test_sample): sims = doc2vec_embs.model.docvecs.most_similar([test_sample], topn=1) for s in sims: print(x_train[s[0]])
test_sample = x_test_t[0] print(x_test[0]) similiar_docs(doc2vec_embs, test_sample)
However, the retrieved docs aren't similar. Am I missing something here?
Score is depending on training data and features. Many people mentioned that no feature engineering is required for deep learning. It is true somehow but you still need to tell neural network that how to extra feature. For example, you may add Part-of-Speech, character etc.
@makcedward I am trying to retrieve similar documents from the given document. Here is the code snippet:
x_train_t = doc2vec_embs.encode(documents=x_train) x_test_t = doc2vec_embs.encode(documents=x_test)
def similiar_docs(doc2vec_embs, test_sample): sims = doc2vec_embs.model.docvecs.most_similar([test_sample], topn=1) for s in sims: print(x_train[s[0]])
test_sample = x_test_t[0] print(x_test[0]) similiar_docs(doc2vec_embs, test_sample)
However, the retrieved docs aren't similar. Am I missing something here?