lanwuwei / SPM_toolkit

Neural network toolkit for sentence pair modeling.
302 stars 70 forks source link

some questions on PWIM #6

Closed BruceLee66 closed 5 years ago

BruceLee66 commented 6 years ago

running main.py and it throw out "cannot import name 'load_word_vectors'" did not have this function?

lanwuwei commented 6 years ago

Hi BruceLee, you need install torchtext 0.1.1

BruceLee66 commented 6 years ago

chinese?Thanks a lot.Now i have a paraphrase identifity problem. After reading your paper,i have learnt a lot.

lanwuwei commented 6 years ago

Yes :)

caoxu915683474 commented 6 years ago

@BruceLee66 Hi BruceLee ! You can also write this function in util.py to load your pre-trained word embedding.

def load_word_vecs(path):
    itos, vectors, dim = [], array.array(str('d')), None
    with open(path, 'r') as f:
        lines = [line for line in f]
    for line in tqdm(lines, total=len(lines)):
        # Explicitly splitting on " " is important, so we don't
        # get rid of Unicode non-breaking spaces in the vectors.
        entries = line.rstrip().split(" ")
        word, entries = entries[0], entries[1:]
        # print(word)
        if dim is None and len(entries) > 1:
            dim = len(entries)
        elif len(entries) == 1:
            continue
        elif dim != len(entries):
            raise RuntimeError(
                "Vector for token {} has {} dimensions, but previously "
                "read vectors have {} dimensions. All vectors must have "
                "the same number of dimensions.".format(word, len(entries), dim))
        vectors.extend(float(x) for x in entries)
        itos.append(word)
    stoi = {word: i for i, word in enumerate(itos)}
    vectors = torch.Tensor(vectors).view(-1, dim)
    dim = dim
    return stoi, vectors, dim

because I use python3.6 and the code of torchtext 0.1.1 in python3.6 is different from the style of python2.7. If you use 3.6 you can use this function.

BruceLee66 commented 6 years ago

@caoxu915683474 这个方法能加载本地词向量?

caoxu915683474 commented 6 years ago

@BruceLee66 是的,我是仿照torchtext 里的方法写的。

BruceLee66 commented 6 years ago

@caoxu915683474 请问你处理2个中文句子的时候,出现这个问题没有? model.py:317: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number pos1=torch.div(indix,simCube.size(2)).data[0]

caoxu915683474 commented 6 years ago

@BruceLee66 我也报这个warning.

BruceLee66 commented 6 years ago

@caoxu915683474 那你找到错误没有?加我微信857243838

NiceMartin commented 5 years ago

load_word_vectors 能加载本地的词向量吗?

lanwuwei commented 5 years ago

可以的,保证你的词向量和glove格式一样就行:每一行是word + vector

NiceMartin commented 5 years ago

Hi lanwuwei, 谢谢你的回答。 在使用load_word_vectors时,load_word_vectors(embedding_path, 'glove.840B', EMBEDDING_DIM) 会自动从 nlp.stanford.edu/data/ 下载数据。 怎么设置,才能加载本地的词向量?

Many thanks!

可以的,保证你的词向量和glove格式一样就行:每一行是word + vector