suhaibani / JointReps

Learning word representation jointly using a corpus and a knowledge base (KB)
MIT License
18 stars 1 forks source link

Issue about generating co-occurrence matrix #2

Closed jackyuanjie1990 closed 5 years ago

jackyuanjie1990 commented 5 years ago

Hi,

I have a problem when I try to use your code on a new corpus. I used GloVe code to generate a cooccurrrence.bin file and then I want to use your code directly. However, I found that your code can't use cooccurrrence.bin as input. ( I'm not familiar with C++, maybe make some mistakes ). If possible, could you share a tool to generate a co-occurrence matrix which satisfies the input of your codes?

Thanks, Jack

svjan5 commented 5 years ago

You can generate the binary co-occurrence matrix using GloVe code and then can use the C code below for converting it into required format.

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <pthread.h>
#include <time.h>

#define _FILE_OFFSET_BITS 64
#define MAX_STRING_LENGTH 1000

typedef double real;
typedef struct cooccur_rec {
    int word1;
    int word2;
    real val;
} CREC;

int main(){
    CREC cr;
    FILE *fin, *fout;
    fin  = fopen("cooccurrence.bin", "rb");
    fout = fopen("cooccurrence.txt", "w");
    int i = 0;
    while(1){
        fread(&cr, sizeof(CREC), 1, fin);
        if(feof(fin)) break;
        if (cr.word1 < 1 || cr.word2 < 1) { continue; }
        fprintf(fout, "%d\t%d\t%f\n", cr.word1, cr.word2, cr.val);
    }
    fclose(fin);
    fclose(fout);
}
jackyuanjie1990 commented 5 years ago

Cool, thank you very much!