Closed huongvu16 closed 6 years ago
Did you end up solving this?
Hi, yes I fixed the issued with the categorical columns.
I'm trying to the also produce the prediction array for the training set as well, using the following code:
def main(unused_argv):
learn_runner.run(
experiment_fn=_make_experiment_fn,
output_dir=FLAGS.output_dir,
schedule='train_and_evaluate')
feature_columns,train_input_fn = _make_input_fn('train')
estimator = _get_tfbt(FLAGS.output_dir,feature_columns)
results = estimator.predict(input_fn=train_input_fn)
y_predict = np.array([r['probabilities'][1] for r in results])
np.save(os.path.join(FLAGS.output_dir,'train_prediction_tf.npy'),y_predict)
if __name__ == '__main__':
tf.logging.set_verbosity(tf.logging.INFO)
parser = argparse.ArgumentParser()
parser.add_argument(
"--batch_size",
type=int,
default=10000,
help="The batch size for reading data.")
parser.add_argument(
"--depth",
type=int,
default=6,
help="Maximum depth of weak learners.")
parser.add_argument(
"--l2",
type=float,
default=1.0,
help="l2 regularization per batch.")
parser.add_argument(
"--learning_rate",
type=float,
default=0.1,
help="Learning rate (shrinkage weight) with which each new tree is added.")
parser.add_argument(
"--examples_per_layer",
type=int,
default=5000,
help="Number of examples to accumulate stats for per layer.")
parser.add_argument(
"--num_trees",
type=int,
default=10,
help="Number of trees to grow before stopping.")
FLAGS, unparsed = parser.parse_known_args()
FLAGS.output_dir = 'outputs/tf_t{:03d}_d{:02d}_ex{:05d}'.format(
FLAGS.num_trees, FLAGS.depth, FLAGS.examples_per_layer)
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
The kernel runs but seems to take forever (I leave it for an hour and when I come back it is still running - without any new logs though) Do you think this is because the X_train file is too big?
From my tests, it did seem very finicky, so I wouldn't be surprised if it's acting up. Try raising the verbosity options, maybe you'll get some more output.
Hi Nicolo! Thanks for posting the example of Gradient Boosting in Tensorflow. I am trying to replicate your model using different data set (lending club data - sample data used to run examples in h2o.ai) and trying to customize your code to fit this dataset (named 'processed.csv' here, from which I have deleted all rows that include 'nan' values)
By the way, I am only interested in running the Tensorflow model, not XGBoost.
I am running on Python 3.6 in Anaconda environment, Tensorflow version 1.4, on Mac OS X 10.12.6
The pre-processing part is as follows (I have omitted the all the imports)
Results of this is:
Now the model in Tensorflow (all the imports omitted):
And the result is:
From the codes I pasted above, do you have any pointer to which could be causing this problem?
Many thanks!