some questions on PWIM - Githubissues

BruceLee66 commented 6 years ago

running main.py and it throw out "cannot import name 'load_word_vectors'" did not have this function?

lanwuwei commented 6 years ago

Hi BruceLee, you need install torchtext 0.1.1

BruceLee66 commented 6 years ago

chinese?Thanks a lot.Now i have a paraphrase identifity problem. After reading your paper,i have learnt a lot.

lanwuwei commented 6 years ago

Yes :)

caoxu915683474 commented 6 years ago

@BruceLee66 Hi BruceLee ! You can also write this function in util.py to load your pre-trained word embedding.

def load_word_vecs(path):
    itos, vectors, dim = [], array.array(str('d')), None
    with open(path, 'r') as f:
        lines = [line for line in f]
    for line in tqdm(lines, total=len(lines)):
        # Explicitly splitting on " " is important, so we don't
        # get rid of Unicode non-breaking spaces in the vectors.
        entries = line.rstrip().split(" ")
        word, entries = entries[0], entries[1:]
        # print(word)
        if dim is None and len(entries) > 1:
            dim = len(entries)
        elif len(entries) == 1:
            continue
        elif dim != len(entries):
            raise RuntimeError(
                "Vector for token {} has {} dimensions, but previously "
                "read vectors have {} dimensions. All vectors must have "
                "the same number of dimensions.".format(word, len(entries), dim))
        vectors.extend(float(x) for x in entries)
        itos.append(word)
    stoi = {word: i for i, word in enumerate(itos)}
    vectors = torch.Tensor(vectors).view(-1, dim)
    dim = dim
    return stoi, vectors, dim

because I use python3.6 and the code of torchtext 0.1.1 in python3.6 is different from the style of python2.7. If you use 3.6 you can use this function.

BruceLee66 commented 6 years ago

@caoxu915683474 这个方法能加载本地词向量？

caoxu915683474 commented 6 years ago

@BruceLee66 是的，我是仿照torchtext 里的方法写的。

BruceLee66 commented 6 years ago

@caoxu915683474 请问你处理2个中文句子的时候，出现这个问题没有？ model.py:317: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number pos1=torch.div(indix,simCube.size(2)).data[0]

caoxu915683474 commented 6 years ago

@BruceLee66 我也报这个warning.

BruceLee66 commented 6 years ago

@caoxu915683474 那你找到错误没有？加我微信857243838

NiceMartin commented 5 years ago

load_word_vectors 能加载本地的词向量吗？

lanwuwei commented 5 years ago

可以的，保证你的词向量和glove格式一样就行：每一行是word + vector

NiceMartin commented 5 years ago

Hi lanwuwei, 谢谢你的回答。在使用load_word_vectors时，load_word_vectors(embedding_path, 'glove.840B', EMBEDDING_DIM) 会自动从 nlp.stanford.edu/data/ 下载数据。怎么设置，才能加载本地的词向量？

Many thanks!

可以的，保证你的词向量和glove格式一样就行：每一行是word + vector

lanwuwei / SPM_toolkit

some questions on PWIM #6