marian-nmt / marian-regression-tests

Regression tests for the marian-dev repository
Other
4 stars 9 forks source link

regression tests for factors code #77

Closed pedrodiascoelho closed 3 years ago

pedrodiascoelho commented 3 years ago

Regression tests done in the scope of the marian-cef EU project to test the factors code. Two tests were done that test the usage of factors for both training and decoding by using factors both on source and target to encode subword splits. Two factors are used, |s1 if the word had the '@@' suffix and |s0 otherwise. Example: 'I was study@@ ing' becomes 'I|s0 was|s0 study|s1 ing|s0'

Also, as asked in the scope of the EU marian-cef project, these tests only test the code prior to the changes that we've done to the factors code (marian-nmt/marian-dev#748), which means that they test the current factors code in the original marian-nmt/marian-dev repo. Later to the feedback of this regression-tests, and after solving the issues pointed out in the mentioned PR, we'll deliver regression tests for the new features as well (combining embeddings with concatenation).

Finally, this was tested on marian 1.9.37. When tested in marian 1.10 I noticed that the format of the cost output that marian gives during training is different from the one in 1.9.37. Do I need to set a specific option to make it equal to the 1.9.37 version or should I change the .expected file to be in accordance to marian 1.10? (this only affects the training test)

The link to the pre-trained model for the decoding test can be found here: https://drive.google.com/file/d/1o0cUa6BWJYmjDQGqxkcRa-7GAfTMY_i0/view?usp=sharing

cc @snukky @kpu

snukky commented 3 years ago

Finally, this was tested on marian 1.9.37. When tested in marian 1.10 I noticed that the format of the cost output that marian gives during training is different from the one in 1.9.37. Do I need to set a specific option to make it equal to the 1.9.37 version or should I change the .expected file to be in accordance to marian 1.10?

Yes, please update the output to the most recent version of marian-dev - there was a change in the default logging output format in 1.10, which affect some tests.

snukky commented 3 years ago

I managed to test it, and the decoding test

tests/decoder/factors/test_factors.sh

fails for me with the following 2 different lines:

diff factors.out factors.expected
22c22
< in|s0 dem|s0 Rahmen|s0 ,|s0 in|s0 dem|s0 Sie|s0 das|s0 Wort|s0 ergreifen|s0 ,|s0 haben|s0 Sie|s0 das|s0 Recht|s0 ,|s0 daß|s0 sie|s0 die|s0 Rechtsvorschriften|s0 zum|s0 Schutz|s0 der|s0 Union|s0 als|s0 Zivil|s1 recht|s0 werden|s0 .|s0
---
> in|s0 dem|s0 Rahmen|s0 ,|s0 in|s0 dem|s0 Sie|s0 das|s0 gleiche|s0 tun|s0 ,|s0 haben|s0 Sie|s0 das|s0 Recht|s0 und|s0 John|s0 ,|s0 Rechtsvorschriften|s0 zu|s0 erlassen|s0 ,|s0 um|s0 die|s0 Union|s0 als|s0 ein|s0 Zivil|s1 recht|s0 zu|s0 schützen|s0 .|s0
31c31
< wenn|s0 die|s0 Vereinigten|s0 Staaten|s0 Japan|s0 nach|s0 dem|s0 Zweiten|s0 Weltkrieg|s0 besetzten|s0 Japan|s0 besetzten|s0 ,|s0 haben|s0 General|s0 @-@|s0 Mac|s1 ons|s0 und|s0 sein|s0 ai|s1 res|s0 Land|s0 das|s0 Land|s0 ermutigt|s0 ,|s0 eine|s0 Verfassung|s0 zu|s0 verabschieden|s0 ,|s0 mit|s0 der|s0 sichergestellt|s0 werden|s0 soll|s0 ,|s0 daß|s0 der|s0 milit|s1 ante|s0 P|s1 ater|s0 To|s1 jo|s0 durch|s0 die|s0 Demokratie|s0 ersetzt|s0 wird|s0 .|s0
---
> wenn|s0 die|s0 Vereinigten|s0 Staaten|s0 Japan|s0 nach|s0 dem|s0 Zweiten|s0 Weltkrieg|s0 besetzten|s0 Japan|s0 besetzten|s0 ,|s0 haben|s0 General|s0 @-@|s0 Mac|s1 ons|s0 und|s0 sein|s0 ai|s1 res|s0 Land|s0 das|s0 Land|s0 ermutigt|s0 ,|s0 eine|s0 Verfassung|s0 zu|s0 verabschieden|s0 ,|s0 mit|s0 der|s0 sichergestellt|s0 werden|s0 soll|s0 ,|s0 daß|s0 der|s0 milit|s1 ante|s0 Führer|s1 s|s0 To|s0 To|s1 jo|s0 durch|s0 die|s0 Demokratie|s0 ersetzt|s0 wird|s0 .|s0

I will test on another machine, but in the meantime could you please confirm that the current test should work?

I have two more comments:

  1. I would prefer to have the trained vocab files vocab.{en,de}.fsv included in the tarball with the model (I will do that before the merge). They are an integral part with the model itself and are quite large comparing to other files we keep in the repo. In tests, you can use paths to models/factors/vocab.{en,de}.fsv or simply copy the files into the working directory.
  2. I know that the current directory structure suggests otherwise, but I think it will be easier to maintain the tests if they sit in a single directory, e.g. /tests/factors/. Could you move them? There will be only single setup.sh needed.
pedrodiascoelho commented 3 years ago

I tested it with the current marian-dev master branch, and compiled it with the following options:

cmake -DUSE_SENTENCEPIECE=ON -DUSE_FBGEMM=ON -DCOMPILE_CPU=ON -DCOMPILE_TESTS=ON -DCOMPILE_EXAMPLES=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.1 ..

and it passed both tests

Edit - I noticed that you have on the README.md that you compile it with gcc 8.4 and I used gcc 7.5. I'll recompile it with the correct gcc and see if it changes the output.

Edit2 - gcc 8.4 works fine as well. I have noticed minor differences in decoding when using different cuda versions. Did you also used 10.1?

pedrodiascoelho commented 3 years ago

Followed your suggestions and joined everything in a single folder. Also, here is the link to the updated .tar with the model and the vocabularies: https://drive.google.com/file/d/1Fnt7QCdX--vfSTbWlxXszD3cBxP83lyr/view?usp=sharing

snukky commented 3 years ago

Yes, I compile with GCC 8.4 and CUDA 10.1 using the following command that we use in Jenkins CI:

CC=/usr/bin/gcc-8 CXX=/usr/bin/g++-8 CUDAHOSTCXX=/usr/bin/g++-8 cmake -DCOMPILE_TESTS=ON -DCOMPILE_EXAMPLES=ON -DCUDA_TOOLKIT_ROOT_DIR=/var/lib/jenkins/cuda-10.1 -DCOMPILE_CPU=on -DUSE_FBGEMM=on -DUSE_SENTENCEPIECE=ON -DCOMPILE_SERVER=ON ..

I'm getting the same exact differences (two lines only, line 22 and 31, 1-based) on our machine with Jenkins that has an older GPU that I tested previously. Simply deleting those two lines from .in and .expected solves the issue, so will do it when merging.

snukky commented 3 years ago

Merged. Let's see if Jenkins complain about the new tests.

Thanks!