marius-team / marius

Large scale graph learning on a single machine.
https://marius-project.org
Apache License 2.0
160 stars 45 forks source link

Expand debug mode to log more information #61

Closed JasonMoho closed 3 years ago

JasonMoho commented 3 years ago

The current debug mode, which is set by reporting.log_level=debug, does not output enough information to be useful for debugging purposes.

This change adds more debug statements to the preprocessing and initialization phase, to make diagnosing user issues easier.

Relates to #60 and #55

Example output:

Info:

(pytorch_latest_p37) ubuntu@ip-172-31-9-241:~/marius$ build/marius_train 237.ini --reporting.log_level=info
[info] [08/16/21 21:33:03.160] Start preprocessing
[info] [08/16/21 21:33:03.213] Training set initialized
[info] [08/16/21 21:33:03.666] Evaluation set initialized
[info] [08/16/21 21:33:03.666] Preprocessing Complete: 0.506s
[info] [08/16/21 21:33:03.670] ################ Starting training epoch 1 ################
[info] [08/16/21 21:33:03.686] Total Edges Processed: 20000, Percent Complete: 0.071
[info] [08/16/21 21:33:03.695] Total Edges Processed: 40000, Percent Complete: 0.143
[info] [08/16/21 21:33:03.703] Total Edges Processed: 60000, Percent Complete: 0.214
[info] [08/16/21 21:33:03.712] Total Edges Processed: 80000, Percent Complete: 0.286
[info] [08/16/21 21:33:03.720] Total Edges Processed: 100000, Percent Complete: 0.357
[info] [08/16/21 21:33:03.728] Total Edges Processed: 120000, Percent Complete: 0.429
[info] [08/16/21 21:33:03.737] Total Edges Processed: 140000, Percent Complete: 0.500
[info] [08/16/21 21:33:03.745] Total Edges Processed: 160000, Percent Complete: 0.571
[info] [08/16/21 21:33:03.754] Total Edges Processed: 180000, Percent Complete: 0.643
[info] [08/16/21 21:33:03.762] Total Edges Processed: 200000, Percent Complete: 0.714
[info] [08/16/21 21:33:03.771] Total Edges Processed: 220000, Percent Complete: 0.786
[info] [08/16/21 21:33:03.779] Total Edges Processed: 240000, Percent Complete: 0.857
[info] [08/16/21 21:33:03.787] Total Edges Processed: 260000, Percent Complete: 0.929
[info] [08/16/21 21:33:03.796] Total Edges Processed: 272115, Percent Complete: 1.000
[info] [08/16/21 21:33:03.796] ################ Finished training epoch 1 ################
[info] [08/16/21 21:33:03.796] Epoch Runtime (Before shuffle/sync): 126ms
[info] [08/16/21 21:33:03.796] Edges per Second (Before shuffle/sync): 2159642.8
[info] [08/16/21 21:33:03.796] Edges Shuffled
[info] [08/16/21 21:33:03.796] Epoch Runtime (Including shuffle/sync): 126ms
[info] [08/16/21 21:33:03.796] Edges per Second (Including shuffle/sync): 2159642.8

Debug:

(pytorch_latest_p37) ubuntu@ip-172-31-9-241:~/marius$ build/marius_train 237.ini --reporting.log_level=debug
[info] [08/16/21 21:32:23.977] Start preprocessing
[debug] [08/16/21 21:32:23.977] Initializing Model
[debug] [08/16/21 21:32:23.977] Empty Encoder
[debug] [08/16/21 21:32:23.978] DistMult Decoder
[debug] [08/16/21 21:32:23.978] data/ directory already exists
[debug] [08/16/21 21:32:23.978] data/marius/ directory already exists
[debug] [08/16/21 21:32:23.978] data/marius/embeddings/ directory already exists
[debug] [08/16/21 21:32:23.978] data/marius/relations/ directory already exists
[debug] [08/16/21 21:32:23.978] data/marius/edges/ directory already exists
[debug] [08/16/21 21:32:23.978] data/marius/edges/train/ directory already exists
[debug] [08/16/21 21:32:23.978] data/marius/edges/evaluation/ directory already exists
[debug] [08/16/21 21:32:23.978] data/marius/edges/test/ directory already exists
[debug] [08/16/21 21:32:23.979] Edges: DeviceMemory storage initialized
[debug] [08/16/21 21:32:23.991] Edges shuffled
[debug] [08/16/21 21:32:23.991] Edge storage initialized. Train: 272115, Valid: 17535, Test: 20466
[debug] [08/16/21 21:32:24.031] Node embeddings: DeviceMemory storage initialized
[debug] [08/16/21 21:32:24.031] Node embeddings state: DeviceMemory storage initialized
[debug] [08/16/21 21:32:24.031] Node embeddings initialized: 14951
[debug] [08/16/21 21:32:24.033] Relation embeddings: DeviceMemory storage initialized
[debug] [08/16/21 21:32:24.033] Relation embeddings state: DeviceMemory storage initialized
[debug] [08/16/21 21:32:24.033] Relation embeddings initialized: 237
[debug] [08/16/21 21:32:24.033] Getting batches from edge list
[info] [08/16/21 21:32:24.033] Training set initialized
[debug] [08/16/21 21:32:24.033] Filtered evaluation
[debug] [08/16/21 21:32:24.490] Getting batches from edge list
[debug] [08/16/21 21:32:24.490] Batches initialized
[info] [08/16/21 21:32:24.490] Evaluation set initialized
[info] [08/16/21 21:32:24.490] Preprocessing Complete: 0.512s
[debug] [08/16/21 21:32:24.494] Loaded training set
[info] [08/16/21 21:32:24.494] ################ Starting training epoch 1 ################
[debug] [08/16/21 21:32:24.503] Loss: 124331.86, Regularization loss: 0.005004995
[debug] [08/16/21 21:32:24.510] Loss: 124296.58, Regularization loss: 0.0054659764
[info] [08/16/21 21:32:24.512] Total Edges Processed: 20000, Percent Complete: 0.071
[debug] [08/16/21 21:32:24.515] Loss: 123891.34, Regularization loss: 0.005674169
[debug] [08/16/21 21:32:24.519] Loss: 122934.516, Regularization loss: 0.005799696
[info] [08/16/21 21:32:24.521] Total Edges Processed: 40000, Percent Complete: 0.143
[debug] [08/16/21 21:32:24.523] Loss: 120989.09, Regularization loss: 0.005975036
[debug] [08/16/21 21:32:24.527] Loss: 118009.516, Regularization loss: 0.006191378
...