Closed exalate-issue-sync[bot] closed 1 year ago
Cliff Click commented: REST calls for inspection, like "Frame" and "Frames" and various summaries require rollups, and run at the normal F/J priorities. If the F/J queues are slammed with other work, e.g. a big DL job, then the interactive commands run at a "best effort" basis - and so get stuck behind the DL work.
Cliff Click commented: Hopefully found and fixed all places where interactive calls are waiting for cores. Mostly it was ChunkSummary & Rollups
JIRA Issue Migration Info
Jira Issue: PUBDEV-2109 Assignee: Cliff Click Reporter: Arno Candel State: Resolved Fix Version: N/A Attachments: N/A Development PRs: N/A
10 nodes, mr-0xd*
parseFiles paths: ["/home/0xdiag/datasets/billions/four_billion_rows.csv"] destination_frame: "four_billion_rows.hex" parse_type: "CSV" separator: 44 number_columns: 2 single_quotes: false column_names: null column_types: ["Numeric","Enum"] delete_on_done: true check_header: -1 chunk_size: 4194304
buildModel 'deeplearning', {"model_id":"deeplearning-82ac6efa-06a8-400b-8a7d-87defafc5b73","training_frame":"four_billion_rows.hex","nfolds":0,"response_column":"C2","ignored_columns":[],"ignore_const_cols":true,"activation":"Rectifier","hidden":[200,200],"epochs":10,"variable_importances":false,"balance_classes":false,"max_confusion_matrix_size":20,"max_hit_ratio_k":10,"checkpoint":"","use_all_factor_levels":true,"train_samples_per_iteration":"-1","adaptive_rate":true,"input_dropout_ratio":0,"l1":0,"l2":0,"loss":"Automatic","distribution":"AUTO","score_interval":5,"score_training_samples":10000,"score_duty_cycle":0.1,"replicate_training_data":true,"autoencoder":false,"overwrite_with_best_model":true,"target_ratio_comm_to_comp":0.02,"seed":476458924607192960,"rho":0.99,"epsilon":1e-8,"max_w2":"Infinity","initial_weight_distribution":"UniformAdaptive","classification_stop":0,"diagnostics":true,"fast_mode":true,"force_load_balance":true,"single_node_mode":false,"shuffle_training_data":false,"missing_values_handling":"MeanImputation","quiet_mode":false,"sparse":false,"col_major":false,"average_activation":0,"sparsity_beta":0,"max_categorical_features":2147483647,"reproducible":false,"export_weights_and_biases":false}
While DL is training, call 'getFrames' from Flow, takes at least 10 minutes to respond (but will respond eventually).
"qtp285402953-13" prio=9 tid=13 java.lang.Thread.State: TIMED_WAITING