E2E.zip
I've attached the notebook. Everything runs successfully through the training and a dump of the resulting trees looks reasonable.
I modified the DMatrix processing code so that I could create a training set with some of the mortgage data so instead of gpu_dfs I have gpu_train_dfs and gpu_test_dfs. The training works but dask_xgboost.predict fails with:
'''
in
----> 1 preds=dxgb_gpu.predict(client,bst,gpu_test_dfs)
/conda/envs/gdf/lib/python3.5/site-packages/dask_xgboost-0.1.5-py3.5.egg/dask_xgboost/core.py in predict(client, model, data)
301 **kwargs)
302
--> 303 return result
304
305
UnboundLocalError: local variable 'result' referenced before assignment
'''
Data prep code:
'''
def makeDMatrix(gpu_dfs,client):
tmp_map = [(gpu_df, list(client.who_has(gpu_df).values())[0]) for gpu_df in gpu_dfs]
new_map = {}
for key, value in tmp_map:
if value not in new_map:
new_map[value] = [key]
else:
new_map[value].append(key)
del(tmp_map)
gpu_dfs = []
for list_delayed in new_map.values():
gpu_dfs.append(delayed(pygdf.concat)(list_delayed))
del(new_map)
gpu_dfs = [(gpu_df[['delinquency_12']], gpu_df[delayed(list)(gpu_df.columns.difference(['delinquency_12']))]) \
for gpu_df in gpu_dfs]
gpu_dfs = [dask.delayed(xgb.DMatrix)(gpu_df[1], gpu_df[0]) for gpu_df in gpu_dfs]
gpu_dfs = [gpu_df.persist() for gpu_df in gpu_dfs]
return gpu_dfs
client.run(initialize_rmm_no_pool)
print('part_count ',part_count)
gpu_train_dfs = [delayed(DataFrame.from_arrow)(gpu_df) for gpu_df in gpu_dfs[:8]]
gpu_test_dfs = [delayed(DataFrame.from_arrow)(gpu_df) for gpu_df in gpu_dfs[8:part_count]]
wait([gpu_train_dfs,gpu_test_dfs])
gpu_train_dfs=makeDMatrix(gpu_train_dfs,client)
gpu_test_dfs=makeDMatrix(gpu_test_dfs,client)
gc.collect()
wait([gpu_train_dfs,gpu_test_dfs])
labels = None
bst = dxgb_gpu.train(client, dxgb_gpu_params, gpu_train_dfs, labels, num_boost_round=dxgb_gpu_params['nround'])
**preds=dxgb_gpu.predict(client,bst,gpu_test_dfs)**
'''
E2E.zip I've attached the notebook. Everything runs successfully through the training and a dump of the resulting trees looks reasonable. I modified the DMatrix processing code so that I could create a training set with some of the mortgage data so instead of gpu_dfs I have gpu_train_dfs and gpu_test_dfs. The training works but dask_xgboost.predict fails with: '''
UnboundLocalError Traceback (most recent call last)