Closed ita9naiwa closed 5 years ago
I downloaded KakaoBrunch12M dataset and ran by copying example_als.py.
KakaoBrunch12M
example_als.py
but it seems that training fails (Nan Error)
def example1(): log.set_log_level(log.DEBUG) als_option = ALSOption().get_default_option() als_option.validation = aux.Option({'topk': 10}) data_option = MatrixMarketOptions().get_default_option() data_option.input.main = '../tests/data/ext/main' #data_option.input.iid = '../tests/data/iid' als = ALS(als_option, data_opt=data_option) als.initialize() als.train() print('metrics for validations\n%s' % json.dumps(als.get_validation_results(), indent=2)) print('Run hyper parameter optimization for val_ndcg...') als.opt.num_workers = 4 als.opt.evaluation_period = 10 als.opt.optimize = aux.Option({ 'loss': 'val_ndcg', 'max_trials': 100, 'deployment': True, 'start_with_default_parameters': True, 'space': { 'd': ['randint', ['d', 10, 128]], 'reg_u': ['uniform', ['reg_u', 0.1, 1.0]], 'reg_i': ['uniform', ['reg_i', 0.1, 1.0]], 'alpha': ['randint', ['alpha', 1, 10]], } }) log.set_log_level(log.INFO) als.opt.model_path = './example1.ml100k.als.optimize.bin' print(json.dumps({'alpha': als.opt.alpha, 'd': als.opt.d, 'reg_u': als.opt.reg_u, 'reg_i': als.opt.reg_i}, indent=2)) als.optimize() als.load('./example1.ml100k.als.optimize.bin')
[ERROR ] 2019-08-31 07:54:58 [als.py:22] ImportError CuALS, no cuda library exists. error message: No module named 'buffalo.algo.cuda' [INFO ] 2019-08-31 07:54:58 [mm.py:193] Create the database from matrix market file. [DEBUG ] 2019-08-31 07:54:58 [mm.py:198] Building meta part... [DEBUG ] 2019-08-31 07:54:58 progress: 0%| 00:00<? [INFO ] 2019-08-31 07:54:58 [mm.py:206] Creating working data... [INFO ] 2019-08-31 07:54:58 progress: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:08 [mm.py:210] Working data is created on /tmp/tmpi_15tjok [INFO ] 2019-08-31 07:55:08 [mm.py:211] Building data part... [INFO ] 2019-08-31 07:55:08 [base.py:346] Building compressed triplets for rowwise... [INFO ] 2019-08-31 07:55:08 [base.py:347] Preprocessing... [INFO ] 2019-08-31 07:55:08 [base.py:350] In-memory Compressing ... [INFO ] 2019-08-31 07:55:11 [base.py:249] Load triplet bin: b'/tmp//chunk.bin' [INFO ] 2019-08-31 07:55:12 [base.py:380] Finished [INFO ] 2019-08-31 07:55:12 [base.py:346] Building compressed triplets for colwise... [INFO ] 2019-08-31 07:55:12 [base.py:347] Preprocessing... [INFO ] 2019-08-31 07:55:12 [base.py:350] In-memory Compressing ... [INFO ] 2019-08-31 07:55:15 [base.py:249] Load triplet bin: b'/tmp//chunk.bin' [INFO ] 2019-08-31 07:55:15 [base.py:380] Finished [INFO ] 2019-08-31 07:55:16 [mm.py:225] DB built on ./mm.h5py [INFO ] 2019-08-31 07:55:16 [als.py:56] ALS({ "evaluation_on_learning": true, "compute_loss_on_training": true, "early_stopping_rounds": 0, "save_best": false, "evaluation_period": 1, "save_period": 10, "random_seed": 0, "validation": { "topk": 10 }, "adaptive_reg": false, "save_factors": false, "accelerator": false, "d": 20, "num_iters": 10, "num_workers": 1, "hyper_threads": 256, "num_cg_max_iters": 3, "reg_u": 0.1, "reg_i": 0.1, "alpha": 8, "optimizer": "manual_cg", "cg_tolerance": 1e-10, "eps": 1e-10, "model_path": "", "data_opt": {} }) [INFO ] 2019-08-31 07:55:16 [als.py:58] MatrixMarket Header(306291, 505926, 12600038) Validation(500 samples) [debug ] 2019-08-31 07:55:16 [als.cc:72] P(306291 x 20) Q(505926 x 20) setted [INFO ] 2019-08-31 07:55:16 [buffered_data.py:71] Set data buffer size as 67108864(minimum required batch size is 35447). [DEBUG ] 2019-08-31 07:55:17 [base.py:342] Cannot find tensorboard configuration. [DEBUG ] 2019-08-31 07:55:17 rowwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:18 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.065s update: 1.25s) [DEBUG ] 2019-08-31 07:55:18 colwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:20 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.063s update: 1.76s) [INFO ] 2019-08-31 07:55:23 [als.py:173] Validation: ndcg:0.00678 map:0.00393 accuracy:0.01616 rmse:2.49048 error:1.70700 Elapsed 3.212 secs [INFO ] 2019-08-31 07:55:23 [als.py:176] Iteration 1: RMSE 0.034 Elapsed 3.134 secs [DEBUG ] 2019-08-31 07:55:23 rowwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:24 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.28s) [DEBUG ] 2019-08-31 07:55:24 colwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:26 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.84s) [INFO ] 2019-08-31 07:55:29 [als.py:173] Validation: ndcg:0.01974 map:0.01420 accuracy:0.03737 rmse:2.40212 error:1.58591 Elapsed 3.219 secs [INFO ] 2019-08-31 07:55:29 [als.py:176] Iteration 2: RMSE 0.030 Elapsed 3.122 secs [DEBUG ] 2019-08-31 07:55:29 rowwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:31 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.28s) [DEBUG ] 2019-08-31 07:55:31 colwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:32 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.79s) [INFO ] 2019-08-31 07:55:36 [als.py:173] Validation: ndcg:0.01994 map:0.01383 accuracy:0.03939 rmse:2.39055 error:1.56148 Elapsed 3.256 secs [INFO ] 2019-08-31 07:55:36 [als.py:176] Iteration 3: RMSE 0.029 Elapsed 3.069 secs [DEBUG ] 2019-08-31 07:55:36 rowwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:37 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.28s) [DEBUG ] 2019-08-31 07:55:37 colwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:39 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.79s) [INFO ] 2019-08-31 07:55:42 [als.py:173] Validation: ndcg:0.02089 map:0.01373 accuracy:0.04343 rmse:2.38696 error:1.55429 Elapsed 3.247 secs [INFO ] 2019-08-31 07:55:42 [als.py:176] Iteration 4: RMSE 0.029 Elapsed 3.080 secs [DEBUG ] 2019-08-31 07:55:42 rowwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:43 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.28s) [DEBUG ] 2019-08-31 07:55:43 colwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:45 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.79s) [INFO ] 2019-08-31 07:55:48 [als.py:173] Validation: ndcg:0.02544 map:0.01794 accuracy:0.04949 rmse:2.38435 error:1.55086 Elapsed 3.222 secs [INFO ] 2019-08-31 07:55:48 [als.py:176] Iteration 5: RMSE 0.029 Elapsed 3.066 secs [DEBUG ] 2019-08-31 07:55:48 rowwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:50 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.27s) [DEBUG ] 2019-08-31 07:55:50 colwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:51 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.84s) [INFO ] 2019-08-31 07:55:55 [als.py:173] Validation: ndcg:0.02673 map:0.01849 accuracy:0.05354 rmse:2.38284 error:1.54936 Elapsed 3.280 secs [INFO ] 2019-08-31 07:55:55 [als.py:176] Iteration 6: RMSE 0.029 Elapsed 3.113 secs [DEBUG ] 2019-08-31 07:55:55 rowwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:56 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.28s) [DEBUG ] 2019-08-31 07:55:56 colwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:55:58 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.81s) [INFO ] 2019-08-31 07:56:00 [als.py:173] Validation: ndcg:0.00000 map:0.00000 accuracy:0.00000 rmse:nan error:nan Elapsed 1.877 secs [INFO ] 2019-08-31 07:56:00 [als.py:176] Iteration 7: RMSE nan Elapsed 3.095 secs [DEBUG ] 2019-08-31 07:56:00 rowwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:56:01 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.26s) [DEBUG ] 2019-08-31 07:56:01 colwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:56:03 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.8s) [INFO ] 2019-08-31 07:56:05 [als.py:173] Validation: ndcg:0.00000 map:0.00000 accuracy:0.00000 rmse:nan error:nan Elapsed 1.882 secs [INFO ] 2019-08-31 07:56:05 [als.py:176] Iteration 8: RMSE nan Elapsed 3.060 secs [DEBUG ] 2019-08-31 07:56:05 rowwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:56:06 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.3s) [DEBUG ] 2019-08-31 07:56:06 colwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:56:08 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.77s) [INFO ] 2019-08-31 07:56:10 [als.py:173] Validation: ndcg:0.00000 map:0.00000 accuracy:0.00000 rmse:nan error:nan Elapsed 1.863 secs [INFO ] 2019-08-31 07:56:10 [als.py:176] Iteration 9: RMSE nan Elapsed 3.076 secs [DEBUG ] 2019-08-31 07:56:10 rowwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:56:11 [als.py:134] rowwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.26s) [DEBUG ] 2019-08-31 07:56:11 colwise: 0%| 00:00<? [DEBUG ] 2019-08-31 07:56:13 [als.py:134] colwise updated: processed(12600038) elapsed(data feed: 0.000s update: 1.78s) [INFO ] 2019-08-31 07:56:14 [als.py:173] Validation: ndcg:0.00000 map:0.00000 accuracy:0.00000 rmse:nan error:nan Elapsed 1.867 secs [INFO ] 2019-08-31 07:56:14 [als.py:176] Iteration 10: RMSE nan Elapsed 3.050 secs [INFO ] 2019-08-31 07:56:14 [als.py:182] elapsed for full epochs: 57.79 sec metrics for validations { "ndcg": 0.0, "map": 0.0, "accuracy": 0.0, "rmse": NaN, "error": NaN } Run hyper parameter optimization for val_ndcg... { "alpha": 8, "d": 20, "reg_u": 0.1, "reg_i": 0.1 } [INFO ] 2019-08-31 07:56:16 optimizing... : 0%| 00:00<? [INFO ] 2019-08-31 07:56:29 [optimize.py:44] Starting with default parameter result: {'train_loss': nan, 'val_ndcg': 0.0, 'val_map': 0.0, 'val_accuracy': 0.0, 'val_rmse': nan, 'val_error': nan, 'eval_time': 1567205789.465719, 'loss': -0.0, 'status': 'ok'} 세그멘테이션 오류 (core dumped)
How can I solve this problem?
I'm using version 1.0.4 of Buffalo.
@ita9naiwa Would you try again with buffalo==1.0.5?
@ummae it works fine with buffalo==1.0.5 and movielens-100k dataset.
thanks!
I downloaded
KakaoBrunch12M
dataset and ran by copyingexample_als.py
.but it seems that training fails (Nan Error)
Code
Logs
How can I solve this problem?
I'm using version 1.0.4 of Buffalo.