Integration test for GL+GT pipeline based on CDS

akolonin commented 5 years ago

Need to have fixed input parses (using only fully parsed sentences) corpus run with LG 5.5.1, with GT testing with same MWC that GL is using for learning.

MSL = no limit, MWC = 1, clustering = ALE 400
MSL = no limit, MWC = 1, clustering = ILE 400
MSL = 3, MWC = 1, clustering = ALE 400
MSL = 3, MWC = 1, clustering = ILE 400
MSL = no limit, MWC = 3, clustering = ALE 400
MSL = no limit, MWC = 3, clustering = ILE 400 Verified on the expected numbers for PA/F1, given https://docs.google.com/spreadsheets/d/1TPbtGrqZ7saUHhOIi5yYmQ9c-cvVlAGqY14ATMPVCq4/edit#gid=2053587208 @OlegBaskov - please provide configuration data for @alexei-gl

When done, need to regenerate all baselines for GC with MWC = 1, 2, 6, 11, 21, 31

alexei-gl commented 5 years ago

It seems 400 clusters is invalid setting for GL with CDS corpus. Although test is ready to be uploaded to the repo, still need to make a decission on the number of clusters to be used. The test works fine with 50 clusters setting. Ready to make PR if that is ok.

akolonin commented 5 years ago

We need to A) change the test to

MSL = no limit, MWC = 1, clustering = ALE 50 MSL = no limit, MWC = 1, clustering = ILE 50 MSL = 3, MWC = 1, clustering = ALE 50 MSL = 3, MWC = 1, clustering = ILE 50 MSL = no limit, MWC = 3, clustering = ALE 50 MSL = no limit, MWC = 3, clustering = ILE 50

B) Add comparison of dict files in addition to comparison of parses

C) Make sure that both dicts and parses are the same under the same environment on different places

D) Make sure that tests are run by Circle-CI to prevent PR-s breaking tests

alexei-gl commented 5 years ago

Done. PR #221.

singnet / language-learning

Integration test for GL+GT pipeline based on CDS #193