a little confused on the training command

andyyuan78 commented 6 years ago

python3 sick_main.py --network_type exp_context_fusion --context_fusion_method block --model_dir_suffix training --gpu 0

for this training command: it looks 'model_dir_suffix' only for create a directory

BTW: would you give me a complete command for testing after I had run the above command, thanks

taoshen58 commented 6 years ago

Did you read the README.md for the whole project framework?

andyyuan78 commented 6 years ago

so, SICK hasn't test mode yet

taoshen58 commented 6 years ago

Yes, it can automatically perform the dev and test

andyyuan78 commented 6 years ago

so , is there a quick way to get the test accuracy for a binary classification task: part of my log here: saving summary... Done on func:get_evaluation at file:evaluator.py #24

getting evaluation result for dev on func:generate_batch_sample_iter at file:dataset.py #100 len of logits_array is 13518 loss_value is -0.101704 len of target_scores is 13518 len of predicted_scores is 13518 pearson_value is 0.038738620546 spearman_value is 0.0165872558521 mse_value is 0.606445758923 on func:get_evaluation at file:evaluator.py #200 ==> for dev, loss: -0.1017, pearson: 0.0387, spearman: 0.0166, mse: 0.6064 on func:get_evaluation at file:evaluator.py #24

getting evaluation result for test on func:generate_batch_sample_iter at file:dataset.py #100 len of logits_array is 6804 loss_value is -0.0995626 len of target_scores is 6804 len of predicted_scores is 6804 pearson_value is 0.031753002086 spearman_value is 0.00509598326937 mse_value is 0.61479430613 on func:get_evaluation at file:evaluator.py #200 ~~> for test, loss: -0.0996, pearson: 0.0318, spearman: 0.0051, mse: 0.6148 on func:update_top_list at file:perform_recorder.py #20 max step: 49500 step: 14500, dev_loss: 0.0080, dev_pearson: 0.1478, dev_spm: 0.0896, dev_mse: 0.6126, test_loss: 0.0100, test_pearson: 0.1459, test_spm: 0.0950, test_mse: 0.6208, step: 15000, dev_loss: 0.0025, dev_pearson: 0.1477, dev_spm: 0.0894, dev_mse: 0.6123, test_loss: 0.0045, test_pearson: 0.1460, test_spm: 0.0944, test_mse: 0.6205, step: 15500, dev_loss: -0.0028, dev_pearson: 0.1476, dev_spm: 0.0890, dev_mse: 0.6120, test_loss: -0.0008, test_pearson: 0.1461, test_spm: 0.0938, test_mse: 0.6202, step: 14000, dev_loss: 0.0137, dev_pearson: 0.1474, dev_spm: 0.0898, dev_mse: 0.6128, test_loss: 0.0157, test_pearson: 0.1454, test_spm: 0.0954, test_mse: 0.6210, step: 16000, dev_loss: -0.0078, dev_pearson: 0.1474, dev_spm: 0.0886, dev_mse: 0.6116, test_loss: -0.0058, test_pearson: 0.1461, test_spm: 0.0932, test_mse: 0.6199, step: 13500, dev_loss: 0.0198, dev_pearson: 0.1472, dev_spm: 0.0899, dev_mse: 0.6132, test_loss: 0.0218, test_pearson: 0.1451, test_spm: 0.0957, test_mse: 0.6213, step: 16500, dev_loss: -0.0125, dev_pearson: 0.1471, dev_spm: 0.0881, dev_mse: 0.6114, test_loss: -0.0105, test_pearson: 0.1460, test_spm: 0.0924, test_mse: 0.6196, step: 13000, dev_loss: 0.0262, dev_pearson: 0.1468, dev_spm: 0.0899, dev_mse: 0.6134, test_loss: 0.0282, test_pearson: 0.1445, test_spm: 0.0959, test_mse: 0.6216, step: 17000, dev_loss: -0.0170, dev_pearson: 0.1468, dev_spm: 0.0876, dev_mse: 0.6111, test_loss: -0.0150, test_pearson: 0.1458, test_spm: 0.0915, test_mse: 0.6194, step: 12500, dev_loss: 0.0330, dev_pearson: 0.1464, dev_spm: 0.0898, dev_mse: 0.6139, test_loss: 0.0349, test_pearson: 0.1440, test_spm: 0.0959, test_mse: 0.6220,

taoshen58 commented 6 years ago

The metrics of SICK are pearson, spearman and MSE.

And, it seems that there is something wrong with your results.

taoshen58 commented 6 years ago

Please refer to the TreeLSTM paper mentioned in my paper for more information about the SICK task.

andyyuan78 commented 6 years ago

I had code changed for debug and suite to my personal data

andyyuan78 commented 6 years ago

if there is a paraphrase identification task, it's dataset like a set: sentence1 sentence2 label(0 or 1)

which sub project suit for it?

taoshen58 commented 6 years ago

If you want to use the proposed model, please refer to https://github.com/taoshen58/BiBloSA/tree/master/context_fusion

And, if you want to use both framework and model, I recommend you to use https://github.com/taoshen58/BiBloSA/tree/master/exp_SNLI which is more suitable for general classification problem rather than regression (e.g., SICK)

andyyuan78 commented 6 years ago

I use SNLI since it can return float(regression task), so we can let the user set a 'gate' to determine the class(label)

taoshen58 / BiBloSA

a little confused on the training command #1

so , is there a quick way to get the test accuracy for a binary classification task: part of my log here: saving summary... Done on func:get_evaluation at file:evaluator.py #24