Closed jackyuanjie1990 closed 5 years ago
You can generate the binary co-occurrence matrix using GloVe code and then can use the C code below for converting it into required format.
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <pthread.h>
#include <time.h>
#define _FILE_OFFSET_BITS 64
#define MAX_STRING_LENGTH 1000
typedef double real;
typedef struct cooccur_rec {
int word1;
int word2;
real val;
} CREC;
int main(){
CREC cr;
FILE *fin, *fout;
fin = fopen("cooccurrence.bin", "rb");
fout = fopen("cooccurrence.txt", "w");
int i = 0;
while(1){
fread(&cr, sizeof(CREC), 1, fin);
if(feof(fin)) break;
if (cr.word1 < 1 || cr.word2 < 1) { continue; }
fprintf(fout, "%d\t%d\t%f\n", cr.word1, cr.word2, cr.val);
}
fclose(fin);
fclose(fout);
}
Cool, thank you very much!
Hi,
I have a problem when I try to use your code on a new corpus. I used GloVe code to generate a cooccurrrence.bin file and then I want to use your code directly. However, I found that your code can't use cooccurrrence.bin as input. ( I'm not familiar with C++, maybe make some mistakes ). If possible, could you share a tool to generate a co-occurrence matrix which satisfies the input of your codes?
Thanks, Jack