Closed JasonMoho closed 3 years ago
The current debug mode, which is set by reporting.log_level=debug, does not output enough information to be useful for debugging purposes.
reporting.log_level=debug
This change adds more debug statements to the preprocessing and initialization phase, to make diagnosing user issues easier.
Relates to #60 and #55
Example output:
Info:
(pytorch_latest_p37) ubuntu@ip-172-31-9-241:~/marius$ build/marius_train 237.ini --reporting.log_level=info [info] [08/16/21 21:33:03.160] Start preprocessing [info] [08/16/21 21:33:03.213] Training set initialized [info] [08/16/21 21:33:03.666] Evaluation set initialized [info] [08/16/21 21:33:03.666] Preprocessing Complete: 0.506s [info] [08/16/21 21:33:03.670] ################ Starting training epoch 1 ################ [info] [08/16/21 21:33:03.686] Total Edges Processed: 20000, Percent Complete: 0.071 [info] [08/16/21 21:33:03.695] Total Edges Processed: 40000, Percent Complete: 0.143 [info] [08/16/21 21:33:03.703] Total Edges Processed: 60000, Percent Complete: 0.214 [info] [08/16/21 21:33:03.712] Total Edges Processed: 80000, Percent Complete: 0.286 [info] [08/16/21 21:33:03.720] Total Edges Processed: 100000, Percent Complete: 0.357 [info] [08/16/21 21:33:03.728] Total Edges Processed: 120000, Percent Complete: 0.429 [info] [08/16/21 21:33:03.737] Total Edges Processed: 140000, Percent Complete: 0.500 [info] [08/16/21 21:33:03.745] Total Edges Processed: 160000, Percent Complete: 0.571 [info] [08/16/21 21:33:03.754] Total Edges Processed: 180000, Percent Complete: 0.643 [info] [08/16/21 21:33:03.762] Total Edges Processed: 200000, Percent Complete: 0.714 [info] [08/16/21 21:33:03.771] Total Edges Processed: 220000, Percent Complete: 0.786 [info] [08/16/21 21:33:03.779] Total Edges Processed: 240000, Percent Complete: 0.857 [info] [08/16/21 21:33:03.787] Total Edges Processed: 260000, Percent Complete: 0.929 [info] [08/16/21 21:33:03.796] Total Edges Processed: 272115, Percent Complete: 1.000 [info] [08/16/21 21:33:03.796] ################ Finished training epoch 1 ################ [info] [08/16/21 21:33:03.796] Epoch Runtime (Before shuffle/sync): 126ms [info] [08/16/21 21:33:03.796] Edges per Second (Before shuffle/sync): 2159642.8 [info] [08/16/21 21:33:03.796] Edges Shuffled [info] [08/16/21 21:33:03.796] Epoch Runtime (Including shuffle/sync): 126ms [info] [08/16/21 21:33:03.796] Edges per Second (Including shuffle/sync): 2159642.8
Debug:
(pytorch_latest_p37) ubuntu@ip-172-31-9-241:~/marius$ build/marius_train 237.ini --reporting.log_level=debug [info] [08/16/21 21:32:23.977] Start preprocessing [debug] [08/16/21 21:32:23.977] Initializing Model [debug] [08/16/21 21:32:23.977] Empty Encoder [debug] [08/16/21 21:32:23.978] DistMult Decoder [debug] [08/16/21 21:32:23.978] data/ directory already exists [debug] [08/16/21 21:32:23.978] data/marius/ directory already exists [debug] [08/16/21 21:32:23.978] data/marius/embeddings/ directory already exists [debug] [08/16/21 21:32:23.978] data/marius/relations/ directory already exists [debug] [08/16/21 21:32:23.978] data/marius/edges/ directory already exists [debug] [08/16/21 21:32:23.978] data/marius/edges/train/ directory already exists [debug] [08/16/21 21:32:23.978] data/marius/edges/evaluation/ directory already exists [debug] [08/16/21 21:32:23.978] data/marius/edges/test/ directory already exists [debug] [08/16/21 21:32:23.979] Edges: DeviceMemory storage initialized [debug] [08/16/21 21:32:23.991] Edges shuffled [debug] [08/16/21 21:32:23.991] Edge storage initialized. Train: 272115, Valid: 17535, Test: 20466 [debug] [08/16/21 21:32:24.031] Node embeddings: DeviceMemory storage initialized [debug] [08/16/21 21:32:24.031] Node embeddings state: DeviceMemory storage initialized [debug] [08/16/21 21:32:24.031] Node embeddings initialized: 14951 [debug] [08/16/21 21:32:24.033] Relation embeddings: DeviceMemory storage initialized [debug] [08/16/21 21:32:24.033] Relation embeddings state: DeviceMemory storage initialized [debug] [08/16/21 21:32:24.033] Relation embeddings initialized: 237 [debug] [08/16/21 21:32:24.033] Getting batches from edge list [info] [08/16/21 21:32:24.033] Training set initialized [debug] [08/16/21 21:32:24.033] Filtered evaluation [debug] [08/16/21 21:32:24.490] Getting batches from edge list [debug] [08/16/21 21:32:24.490] Batches initialized [info] [08/16/21 21:32:24.490] Evaluation set initialized [info] [08/16/21 21:32:24.490] Preprocessing Complete: 0.512s [debug] [08/16/21 21:32:24.494] Loaded training set [info] [08/16/21 21:32:24.494] ################ Starting training epoch 1 ################ [debug] [08/16/21 21:32:24.503] Loss: 124331.86, Regularization loss: 0.005004995 [debug] [08/16/21 21:32:24.510] Loss: 124296.58, Regularization loss: 0.0054659764 [info] [08/16/21 21:32:24.512] Total Edges Processed: 20000, Percent Complete: 0.071 [debug] [08/16/21 21:32:24.515] Loss: 123891.34, Regularization loss: 0.005674169 [debug] [08/16/21 21:32:24.519] Loss: 122934.516, Regularization loss: 0.005799696 [info] [08/16/21 21:32:24.521] Total Edges Processed: 40000, Percent Complete: 0.143 [debug] [08/16/21 21:32:24.523] Loss: 120989.09, Regularization loss: 0.005975036 [debug] [08/16/21 21:32:24.527] Loss: 118009.516, Regularization loss: 0.006191378 ...
The current debug mode, which is set by
reporting.log_level=debug
, does not output enough information to be useful for debugging purposes.This change adds more debug statements to the preprocessing and initialization phase, to make diagnosing user issues easier.
Relates to #60 and #55
Example output:
Info:
Debug: