tticoin / LSTM-ER

Implementation of End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures in ACL2016.
Apache License 2.0
225 stars 73 forks source link

Problems about compilation #1

Closed magician-david closed 7 years ago

magician-david commented 7 years ago

Hello! I met a problem when I typed the command:

cmake .. -DEIGEN3_INCLUDE_DIR=eigen -DCMAKE_CXX_COMPILER=/usr/bin/clang++

The message was:

~/LSTM-ER/build$ cmake .. -DEIGEN3_INCLUDE_DIR=../eigen -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -- Boost version: 1.57.0 -- Found the following Boost libraries: -- program_options -- serialization CMake Error at CMakeLists.txt:23 (find_package): By not providing "FindEigen3.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "Eigen3", but CMake did not find one.

Could not find a package configuration file provided by "Eigen3" with any of the following names:

Eigen3Config.cmake
eigen3-config.cmake

Add the installation prefix of "Eigen3" to CMAKE_PREFIX_PATH or set "Eigen3_DIR" to a directory containing one of the above files. If "Eigen3" provides a separate development package or SDK, be sure it has been installed.

-- Configuring incomplete, errors occurred! See also "~/LSTM-ER/build/CMakeFiles/CMakeOutput.log". See also "~/LSTM-ER/build/CMakeFiles/CMakeError.log".

I guess you have lost a command:

tar xzf cnn.tar.gz tar xzf eigen.tar.gz cd eigen mkdir build cd build cmake .. -DEIGEN3_INCLUDE_DIR=eigen -DCMAKE_CXX_COMPILER=/usr/bin/clang++ make cd ..

Diego999 commented 7 years ago

I got the same problem as you, but compiling within the eigen directory doesn't create all the needed stuff like the build/relation/RelationExtraction. Are you sure about it ?

magician-david commented 7 years ago

Sorry, it can't fix all of the problem. But we do need to build and install Eigen first. Then we can follow the instructions. Maybe these would be helpful:

tar xzf cnn.tar.gz tar xzf eigen.tar.gz cd eigen mkdir build cd build cmake .. -DCMAKE_CXX_COMPILER=/usr/bin/clang++ make install cd ../.. mkdir build cd build cmake .. -DEIGEN3_INCLUDE_DIR=eigen -DCMAKE_CXX_COMPILER=/usr/bin/clang++ make cd ..

I have fixed the compiling problem. But I don't remember all the details. By the way, I have met another problem when I test the model. I think we can discuss it.

Diego999 commented 7 years ago

Thanks for you answer, I still had to fix some CMakeLists files but I could compile. However, I have a similar problem as you, for both training and testing, I have exactly "Segmentation fault". Do you have the same ?

$ build/relation/RelationExtraction --train -y yaml/parameter-semeval-2010.yaml 306560 words, 200 dimensions start loading documents in data/semeval-2010/corpus/train/ Segmentation fault

$ build/relation/RelationExtraction --test -y yaml/parameter-semeval-2010.yaml 306560 words, 200 dimensions Segmentation fault

I have investigated and have found that function is_directory(p) always returns false even if the path is really a directory, and so the training set is not loaded.

magician-david commented 7 years ago

Have you preprocessed the data according to the data/README.md? I think it will fix the training problem on SemEval dataset.

Diego999 commented 7 years ago

Yes I did it and have exactly the files .txt, .txt.conll, .ann, .stanford.so. By the way it works on my virtual machine on Fedora.

Diego999 commented 7 years ago

What was your problem during the testing ? Something like

unknown label:B-Term, treated as negative unknown label:L-Term, treated as negative unknown label:U-Term, treated as negative

For all samples in the testing set ?

EDIT : After re-training the model, it works on the testing set. However, the direction are always e1 -> e2 (except for the negative classes)

magician-david commented 7 years ago

Yes! I got the same problem! In some cases the .pred.ann files are empty.

It seems that we can't test during the training process? semeval-test-problem

How to get the Macro F1-score when testing the model?

Diego999 commented 7 years ago

I guess that empty .pred.ann files mean that the sample is the class Other (negative class).

I did the whole training this morning, and got similar results : testing always 0/0/0. Afterwards, I used the trained model on the testing set (because it didn't work with the already retrained model) and obtained the relations without direction. An example of output is Product-Producer Arg1:T0 Arg2:T0 for the sample 8023, but the relation direction is not specified (T0->T0 rather than T1->T2 or T2->T1).

In term of relation classification, without direction, I used the predictions directly in the official scorer of SemEval-2010 Task 8 and got a macro average F1 (excluding others) around 0.86-0.88 (don't remember exactly).

I don't know what we should do in order to have also the direction (and so, having a non nul P/R/F on the testing during the training and afterwards)

magician-david commented 7 years ago

I tried "build/relation/RelationExtraction --test -y yaml/parameter-semeval-2010.yaml" several days before and I got the same result. But I haven't used the official scorer yet. Thanks.

Diego999 commented 7 years ago

Don't forger that the official scorer give you two things :

I'll check why it outputs always T0->T0 for each relation rather than using T1, T2. If you find why, I would be interested !

Diego999 commented 7 years ago

I solved the problem. If you want to have the direction and have the correct format to feed directly the official scorer, you should modify RelLstmModel.cpp:568

if(dict_.is_reverse_relation(besti)){
                int rbesti = dict_.reverse_relation(besti); // normalize
                string rel_type = dict_.get_rel_string(rbesti);
                vector<string> rels;
                split(rels, rel_type, bind2nd(equal_to<char>(), ':'));
                ofs << rels[1] << " ";
                ofs << "Arg1:T" << ent_map[cell->row()] << " ";
                ofs << "Arg2:T" << ent_map[cell->col()] << endl;
              }else{
                string rel_type = dict_.get_rel_string(besti);
                vector<string> rels;
                split(rels, rel_type, bind2nd(equal_to<char>(), ':'));
                ofs << rels[1] << " ";
                ofs << "Arg1:T" << ent_map[cell->col()] << " ";
                ofs << "Arg2:T" << ent_map[cell->row()] << endl;
              }

to

 if(dict_.is_reverse_relation(besti)){
                int rbesti = dict_.reverse_relation(besti); // normalize
                string rel_type = dict_.get_rel_string(rbesti);
                vector<string> rels;
                split(rels, rel_type, bind2nd(equal_to<char>(), ':'));
                ofs << rels[1] << "(e2,e1)";
              }else{
                string rel_type = dict_.get_rel_string(besti);
                vector<string> rels;
                split(rels, rel_type, bind2nd(equal_to<char>(), ':'));
                ofs << rels[1] << "(e1,e2)";
              }

With this, I obtain a score of 84.87, which is similar to the results of the paper, without the use of external resources

magician-david commented 7 years ago

Perfect! And my score is 83.73. :)

mmiwa commented 7 years ago

I fixed the reported compilation problem by updating cnn.tar.gz, so I close this.

hi-wangyan commented 4 years ago

@magician-david hello, I meet a mistake like follow,can you give me some help?Thank you!

cmake .. -DEIGEN3_INCLUDE_DIR=eigen -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -- The CXX compiler identification is unknown CMake Error at CMakeLists.txt:1 (project): The CMAKE_CXX_COMPILER:

/usr/bin/clang++

is not a full path to an existing compiler tool.

Tell CMake where to find the compiler by setting either the environment variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path to the compiler, or to the compiler name if it is in the PATH.

-- Configuring incomplete, errors occurred! See also "/home/wangyan/PycharmProjects/LSTM-ER-master/build/CMakeFiles/CMakeOutput.log". See also "/home/wangyan/PycharmProjects/LSTM-ER-master/build/CMakeFiles/CMakeError.log".