ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.19k stars 1.19k forks source link

Not predicting correctly? #374

Closed zackthefair closed 5 years ago

zackthefair commented 5 years ago

I have 2 questions:

Question 1) I am building a movie rating prediction model.

my yaml file is:

input_features:
    -
        name: genres
        type: set
    -
        name: popularity
        type: numerical
    -
        name: tagline
        type: category
    -
        name: production_companies
        type: set

output_features:
    -
        name: vote_average
        type: numerical

When I use ludwig test or ludwig predict, terminal doesn't return what it used to return in other predictions: stuff like

===== TARGET COLUMN =====
accuracy: 
hits_at_k: 
loss: 0.8959801991780599
overall_stats: { 'avg_f1_score_macro': 
  'avg_f1_score_micro':
ETC...

Instead, terminal is returning me:

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

███████████████████████
█ █ █ █  ▜█ █ █ █ █   █
█ █ █ █ █ █ █ █ █ █ ███
█ █   █ █ █ █ █ █ █ ▌ █
█ █████ █ █ █ █ █ █ █ █
█     █  ▟█     █ █   █
███████████████████████
ludwig v0.1.2 - Test

Dataset path: tmdb_5000_movies.csv
Model path: results/experiment_run_0/model/
Output path: results_0

Found hdf5 with the same filename of the csv, using it instead
Loading metadata from: results/experiment_run_0/model/train_set_metadata.json
Loading data from: tmdb_5000_movies.hdf5

╒═══════════════╕
│ LOADING MODEL │
╘═══════════════╛

From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/losses/losses_impl.py:667: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.

╒═════════╕
│ PREDICT │
╘═════════╛

2019-06-10 13:05:29.377155: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-10 13:05:29.538029: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-10 13:05:29.538538: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x270d020 executing computations on platform CUDA. Devices:
2019-06-10 13:05:29.538568: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2019-06-10 13:05:29.540367: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2019-06-10 13:05:29.540680: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x270d440 executing computations on platform Host. Devices:
2019-06-10 13:05:29.540712: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-06-10 13:05:29.541001: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
totalMemory: 14.73GiB freeMemory: 14.60GiB
2019-06-10 13:05:29.541025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-10 13:05:29.541673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-10 13:05:29.541699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-06-10 13:05:29.541712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-06-10 13:05:29.541974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14202 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Restoring parameters from results/experiment_run_0/model/model_weights
Evaluation:   0% 0/8 [00:00<?, ?it/s]2019-06-10 13:05:30.394744: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Evaluation: 100% 8/8 [00:02<00:00,  1.50s/it]

===== vote_average =====
error: 0.12470193587586471
loss: 1.4600357071625139
mean_absolute_error: 0.9010308397364916
mean_squared_error: 1.4600357071625139
r2: -0.002586167414816852
Saved to: results_0

In other words, it's returning the best values it found for the training section

ps: I'm using Colab. Don't know if it has something to do with it, but might be useful in order to know how to solve it

zackthefair commented 5 years ago

Question 2)

I'm using a well known tweet analysis model. My yaml file is

input_features:
    -
        name: text
        type: text
        encoder: parallel_cnn
        level: word

output_features:
    -
        name: sentiment
        type: category

I had no problem on training and prediction steps. My problem is when I try to use the roc_curves_from_test_statistics command. Here's what I get from terminal:

!ludwig visualize --visualization roc_curves_from_test_statistics --test_statistics results/experiment_run_0/test_statistics.json --field sentiment --model_names roc_curve

Traceback (most recent call last):
  File "/usr/local/bin/ludwig", line 11, in <module>
    load_entry_point('ludwig==0.1.2', 'console_scripts', 'ludwig')()
  File "/usr/local/lib/python3.6/dist-packages/ludwig-0.1.2-py3.6.egg/ludwig/cli.py", line 102, in main
  File "/usr/local/lib/python3.6/dist-packages/ludwig-0.1.2-py3.6.egg/ludwig/cli.py", line 63, in __init__
  File "/usr/local/lib/python3.6/dist-packages/ludwig-0.1.2-py3.6.egg/ludwig/cli.py", line 88, in visualize
  File "/usr/local/lib/python3.6/dist-packages/ludwig-0.1.2-py3.6.egg/ludwig/visualize.py", line 1818, in cli
  File "/usr/local/lib/python3.6/dist-packages/ludwig-0.1.2-py3.6.egg/ludwig/visualize.py", line 1290, in roc_curves_from_test_statistics
KeyError: 'roc_curve'
w4nderlust commented 5 years ago

Question 1: Not sure i understand what's the problem here. If you want to predict on another section of your data, you can set the split you want to predict on

-s {training,validation,test,full}, --split {training,validation,test,full}
                        the split to test the model on

The terminal output depends on the type of output features you have. The output you expect is for category features, but your feature is numerical, so the measures you get are different. If you run experiment, Ludwig trains on training, validates on validation set and test on test set. If you run test on the same dataset, it will test on the test split of the data, so the results are obviously going to be identical.

Question 2: The ROC curve is computed only for binary output features, yours is categorical.

zackthefair commented 5 years ago

@w4nderlust Thank you for clarifying on question 2. About question 1:

I'll try to explain what I want to accomplish here, there's a chance that I'm misunderstanding some concepts.

I only have one dataset for this model. So it means that along all my dataset, there are values assigned to my output feature. What I want to do is to predict on some of those values and know how well it was predicted. But when I use:

!ludwig experiment --data_csv tmdb_5000_movies.csv --model_definition_file model.yaml

terminal outputs:

Epoch  23
Training: 100% 27/27 [00:00<00:00, 40.19it/s]
Evaluation train: 100% 27/27 [00:00<00:00, 109.45it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 123.55it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 123.16it/s]
Took 1.2066s
╒════════════════╤════════╤══════════════════════╤═══════════════════════╤═════════╤═════════╕
│ vote_average   │   loss │   mean_squared_error │   mean_absolute_error │      r2 │   error │
╞════════════════╪════════╪══════════════════════╪═══════════════════════╪═════════╪═════════╡
│ train          │ 0.4055 │               0.4055 │                0.3002 │  0.0058 │ -0.0552 │
├────────────────┼────────┼──────────────────────┼───────────────────────┼─────────┼─────────┤
│ vali           │ 1.5851 │               1.5851 │                0.8775 │ -0.0035 │ -0.0311 │
├────────────────┼────────┼──────────────────────┼───────────────────────┼─────────┼─────────┤
│ test           │ 1.3598 │               1.3598 │                0.8657 │ -0.0021 │ -0.0223 │
╘════════════════╧════════╧══════════════════════╧═══════════════════════╧═════════╧═════════╛
Last improvement of loss on combined happened 5 epochs ago

EARLY STOPPING due to lack of validation improvement, it has been 5 epochs since last validation accuracy improvement

Best validation model epoch:
Best validation model loss on validation set combined: 1.5667899188868486
Best validation model loss on test set combined: 1.3453248873914136

Finished: experiment_run
Saved to: results/experiment_run_0

╒═════════╕
│ PREDICT │
╘═════════╛

Evaluation: 100% 8/8 [00:00<00:00, 56.33it/s]

===== vote_average =====
error: -0.022265979934437005
loss: 1.359760204618446
mean_absolute_error: 0.8656717005135125
mean_squared_error: 1.359760204618446
r2: -0.0021018019904651404

Finished: experiment_run
Saved to: results/experiment_run_0

So for me, it's like the PREDICT proccess isn't necessary because it was already done in the TRAINING proccess. Is that right for this case in particular? What I want to know is, considering this case that I'm presenting to you, I can measure how well my model is predicting on my output feature based only on my test final values of the training proccess, not being neccessary to use experiment but only train?

w4nderlust commented 5 years ago

Let me give you some context, hopefully it will help clarify. If you have a dataset where all datapoints have output labels, and you provide a full csv, when you run experiment Ludwig splits it in train validation and test set randomly (70% - 10% 20% respectively, this values can be changed). When you run train, the at the end of each epoch the model is used to predict on the training set and on the test set. During training the process may stop due to early stopping, like in your case. At the end of the whole training, Ludwig uses the best model found so far (which may be not the one obtained during the last epoch) and uses it to obtain predictions and statistics on the test set. So, the scores that you see at the end are the same obtained by the model at 18, as it was the model with the best validation loss. You also obtain a number of additional measures in some cases, like in the case of categorical or sequence features, where the confusion matrix is computed and per class statistics are reported. In the case of numerical features, like in your case, there are currently no additional measures than the ones computed during the training phase, so if you don't need the predictions, there is no need to run experiment you can run just train, but running experiment also produces a test_statistics.json file that can be used for obtaining visualizations, so you may want to do that anyway, depending on your needs. Closing this as it is not an issue, feel free to ask further questions in this thread even if it closed.

zackthefair commented 5 years ago

I'd like to know some concepts if possible @w4nderlust

1) About numeric outputs: I'm familiar with mean squared error and mean absolute error, but I don't know what exactly the other 3 stands for: r2, loss and error. (I used to think that that error = loss when I was training category outputs, but now I see that I was wrong)

2) What exactly is hits_at_k?

w4nderlust commented 5 years ago

loss is whatever you ser it to be, by the dault is mean squared error. Error is the error ofr each datapoint (not absolute and not squared), r2 is a measure: https://en.wikipedia.org/wiki/Coefficient_of_determination The hits@k computes if the ground truth is within the first k (by default 3) highest ranked predictions.