Closed zackthefair closed 5 years ago
Question 2)
I'm using a well known tweet analysis model. My yaml file is
input_features:
-
name: text
type: text
encoder: parallel_cnn
level: word
output_features:
-
name: sentiment
type: category
I had no problem on training and prediction steps. My problem is when I try to use the roc_curves_from_test_statistics command. Here's what I get from terminal:
!ludwig visualize --visualization roc_curves_from_test_statistics --test_statistics results/experiment_run_0/test_statistics.json --field sentiment --model_names roc_curve
Traceback (most recent call last):
File "/usr/local/bin/ludwig", line 11, in <module>
load_entry_point('ludwig==0.1.2', 'console_scripts', 'ludwig')()
File "/usr/local/lib/python3.6/dist-packages/ludwig-0.1.2-py3.6.egg/ludwig/cli.py", line 102, in main
File "/usr/local/lib/python3.6/dist-packages/ludwig-0.1.2-py3.6.egg/ludwig/cli.py", line 63, in __init__
File "/usr/local/lib/python3.6/dist-packages/ludwig-0.1.2-py3.6.egg/ludwig/cli.py", line 88, in visualize
File "/usr/local/lib/python3.6/dist-packages/ludwig-0.1.2-py3.6.egg/ludwig/visualize.py", line 1818, in cli
File "/usr/local/lib/python3.6/dist-packages/ludwig-0.1.2-py3.6.egg/ludwig/visualize.py", line 1290, in roc_curves_from_test_statistics
KeyError: 'roc_curve'
Question 1: Not sure i understand what's the problem here. If you want to predict on another section of your data, you can set the split you want to predict on
-s {training,validation,test,full}, --split {training,validation,test,full}
the split to test the model on
The terminal output depends on the type of output features you have. The output you expect is for category features, but your feature is numerical, so the measures you get are different.
If you run experiment
, Ludwig trains on training, validates on validation set and test on test set. If you run test
on the same dataset, it will test on the test split of the data, so the results are obviously going to be identical.
Question 2: The ROC curve is computed only for binary output features, yours is categorical.
@w4nderlust Thank you for clarifying on question 2. About question 1:
I'll try to explain what I want to accomplish here, there's a chance that I'm misunderstanding some concepts.
I only have one dataset for this model. So it means that along all my dataset, there are values assigned to my output feature. What I want to do is to predict on some of those values and know how well it was predicted. But when I use:
!ludwig experiment --data_csv tmdb_5000_movies.csv --model_definition_file model.yaml
terminal outputs:
Epoch 23
Training: 100% 27/27 [00:00<00:00, 40.19it/s]
Evaluation train: 100% 27/27 [00:00<00:00, 109.45it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 123.55it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 123.16it/s]
Took 1.2066s
╒════════════════╤════════╤══════════════════════╤═══════════════════════╤═════════╤═════════╕
│ vote_average │ loss │ mean_squared_error │ mean_absolute_error │ r2 │ error │
╞════════════════╪════════╪══════════════════════╪═══════════════════════╪═════════╪═════════╡
│ train │ 0.4055 │ 0.4055 │ 0.3002 │ 0.0058 │ -0.0552 │
├────────────────┼────────┼──────────────────────┼───────────────────────┼─────────┼─────────┤
│ vali │ 1.5851 │ 1.5851 │ 0.8775 │ -0.0035 │ -0.0311 │
├────────────────┼────────┼──────────────────────┼───────────────────────┼─────────┼─────────┤
│ test │ 1.3598 │ 1.3598 │ 0.8657 │ -0.0021 │ -0.0223 │
╘════════════════╧════════╧══════════════════════╧═══════════════════════╧═════════╧═════════╛
Last improvement of loss on combined happened 5 epochs ago
EARLY STOPPING due to lack of validation improvement, it has been 5 epochs since last validation accuracy improvement
Best validation model epoch:
Best validation model loss on validation set combined: 1.5667899188868486
Best validation model loss on test set combined: 1.3453248873914136
Finished: experiment_run
Saved to: results/experiment_run_0
╒═════════╕
│ PREDICT │
╘═════════╛
Evaluation: 100% 8/8 [00:00<00:00, 56.33it/s]
===== vote_average =====
error: -0.022265979934437005
loss: 1.359760204618446
mean_absolute_error: 0.8656717005135125
mean_squared_error: 1.359760204618446
r2: -0.0021018019904651404
Finished: experiment_run
Saved to: results/experiment_run_0
So for me, it's like the PREDICT proccess isn't necessary because it was already done in the TRAINING proccess. Is that right for this case in particular? What I want to know is, considering this case that I'm presenting to you, I can measure how well my model is predicting on my output feature based only on my test final values of the training proccess, not being neccessary to use experiment but only train?
Let me give you some context, hopefully it will help clarify.
If you have a dataset where all datapoints have output labels, and you provide a full csv, when you run experiment
Ludwig splits it in train validation and test set randomly (70% - 10% 20% respectively, this values can be changed).
When you run train, the at the end of each epoch the model is used to predict on the training set and on the test set. During training the process may stop due to early stopping, like in your case.
At the end of the whole training, Ludwig uses the best model found so far (which may be not the one obtained during the last epoch) and uses it to obtain predictions and statistics on the test set.
So, the scores that you see at the end are the same obtained by the model at 18, as it was the model with the best validation loss. You also obtain a number of additional measures in some cases, like in the case of categorical or sequence features, where the confusion matrix is computed and per class statistics are reported. In the case of numerical features, like in your case, there are currently no additional measures than the ones computed during the training phase, so if you don't need the predictions, there is no need to run experiment
you can run just train
, but running experiment also produces a test_statistics.json
file that can be used for obtaining visualizations, so you may want to do that anyway, depending on your needs.
Closing this as it is not an issue, feel free to ask further questions in this thread even if it closed.
I'd like to know some concepts if possible @w4nderlust
1) About numeric outputs: I'm familiar with mean squared error and mean absolute error, but I don't know what exactly the other 3 stands for: r2, loss and error. (I used to think that that error = loss when I was training category outputs, but now I see that I was wrong)
2) What exactly is hits_at_k?
loss is whatever you ser it to be, by the dault is mean squared error. Error is the error ofr each datapoint (not absolute and not squared), r2 is a measure: https://en.wikipedia.org/wiki/Coefficient_of_determination The hits@k computes if the ground truth is within the first k (by default 3) highest ranked predictions.
I have 2 questions:
Question 1) I am building a movie rating prediction model.
my yaml file is:
When I use ludwig test or ludwig predict, terminal doesn't return what it used to return in other predictions: stuff like
Instead, terminal is returning me:
In other words, it's returning the best values it found for the training section
ps: I'm using Colab. Don't know if it has something to do with it, but might be useful in order to know how to solve it