stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Multithreading issue #153

Closed akjindal53244 closed 5 years ago

akjindal53244 commented 5 years ago

Hi, I am successfully able to train glove vectors on RHEL desktop but when I am trying to run glove training on RHEL based Hadoop Cluster, it fails in glove.c

for(b = 0; b < num_iter; b++) {
        fprintf(stderr,"Entry Point\n");
        total_cost = 0;
        for (a = 0; a < num_threads - 1; a++) lines_per_thread[a] = num_lines / num_threads;
        fprintf(stderr,"First\n");
        lines_per_thread[a] = num_lines / num_threads + num_lines % num_threads;
        fprintf(stderr,"Second\n");
        for (a = 0; a < num_threads; a++) pthread_create(&pt[a], NULL, glove_thread, (void *)a);
        fprintf(stderr,"Third\n"); // Nothing executes after this
        for (a = 0; a < num_threads; a++) pthread_join(pt[a], NULL);
        fprintf(stderr,"Fourth\n");
        for (a = 0; a < num_threads; a++) total_cost += cost[a];
        fprintf(stderr,"Fifth\n");

        fprintf(stderr,"iter: %03d, cost: %lf\n", b+1, total_cost/num_lines);
    }

The training flow executes till printing Third (before call to pthread_join) and nothing prints after that.

I have tried setting num_threads = 1 as well but doesn't help. The exact same code works on desktop. Can you help?

I am calling various scripts of demo.sh from python using os.system() method.

akjindal53244 commented 5 years ago

There was some config issue, able to solve it.