Closed andyyuan78 closed 6 years ago
Did you read the README.md for the whole project framework?
so, SICK hasn't test mode yet
Yes, it can automatically perform the dev and test
getting evaluation result for test on func:generate_batch_sample_iter at file:dataset.py #100 len of logits_array is 6804 loss_value is -0.0995626 len of target_scores is 6804 len of predicted_scores is 6804 pearson_value is 0.031753002086 spearman_value is 0.00509598326937 mse_value is 0.61479430613 on func:get_evaluation at file:evaluator.py #200 ~~> for test, loss: -0.0996, pearson: 0.0318, spearman: 0.0051, mse: 0.6148 on func:update_top_list at file:perform_recorder.py #20 max step: 49500 step: 14500, dev_loss: 0.0080, dev_pearson: 0.1478, dev_spm: 0.0896, dev_mse: 0.6126, test_loss: 0.0100, test_pearson: 0.1459, test_spm: 0.0950, test_mse: 0.6208, step: 15000, dev_loss: 0.0025, dev_pearson: 0.1477, dev_spm: 0.0894, dev_mse: 0.6123, test_loss: 0.0045, test_pearson: 0.1460, test_spm: 0.0944, test_mse: 0.6205, step: 15500, dev_loss: -0.0028, dev_pearson: 0.1476, dev_spm: 0.0890, dev_mse: 0.6120, test_loss: -0.0008, test_pearson: 0.1461, test_spm: 0.0938, test_mse: 0.6202, step: 14000, dev_loss: 0.0137, dev_pearson: 0.1474, dev_spm: 0.0898, dev_mse: 0.6128, test_loss: 0.0157, test_pearson: 0.1454, test_spm: 0.0954, test_mse: 0.6210, step: 16000, dev_loss: -0.0078, dev_pearson: 0.1474, dev_spm: 0.0886, dev_mse: 0.6116, test_loss: -0.0058, test_pearson: 0.1461, test_spm: 0.0932, test_mse: 0.6199, step: 13500, dev_loss: 0.0198, dev_pearson: 0.1472, dev_spm: 0.0899, dev_mse: 0.6132, test_loss: 0.0218, test_pearson: 0.1451, test_spm: 0.0957, test_mse: 0.6213, step: 16500, dev_loss: -0.0125, dev_pearson: 0.1471, dev_spm: 0.0881, dev_mse: 0.6114, test_loss: -0.0105, test_pearson: 0.1460, test_spm: 0.0924, test_mse: 0.6196, step: 13000, dev_loss: 0.0262, dev_pearson: 0.1468, dev_spm: 0.0899, dev_mse: 0.6134, test_loss: 0.0282, test_pearson: 0.1445, test_spm: 0.0959, test_mse: 0.6216, step: 17000, dev_loss: -0.0170, dev_pearson: 0.1468, dev_spm: 0.0876, dev_mse: 0.6111, test_loss: -0.0150, test_pearson: 0.1458, test_spm: 0.0915, test_mse: 0.6194, step: 12500, dev_loss: 0.0330, dev_pearson: 0.1464, dev_spm: 0.0898, dev_mse: 0.6139, test_loss: 0.0349, test_pearson: 0.1440, test_spm: 0.0959, test_mse: 0.6220,
The metrics of SICK are pearson, spearman and MSE.
And, it seems that there is something wrong with your results.
Please refer to the TreeLSTM paper mentioned in my paper for more information about the SICK task.
I had code changed for debug and suite to my personal data
if there is a paraphrase identification task, it's dataset like a set: sentence1 sentence2 label(0 or 1)
which sub project suit for it?
If you want to use the proposed model, please refer to https://github.com/taoshen58/BiBloSA/tree/master/context_fusion
And, if you want to use both framework and model, I recommend you to use https://github.com/taoshen58/BiBloSA/tree/master/exp_SNLI which is more suitable for general classification problem rather than regression (e.g., SICK)
I use SNLI since it can return float(regression task), so we can let the user set a 'gate' to determine the class(label)
python3 sick_main.py --network_type exp_context_fusion --context_fusion_method block --model_dir_suffix training --gpu 0
for this training command: it looks 'model_dir_suffix' only for create a directory
BTW: would you give me a complete command for testing after I had run the above command, thanks