Closed djclouse closed 1 year ago
hi djclouse@, let me link the original discussion on TF Discussion Forum:
https://discuss.tensorflow.org/t/decision-forest-random-forests-ram-issues/2840/31
We indeed haven't yet tried such wide datasets (22K columns). We think we can improve the TensorFlow implementation quite a bit (by using fewer TF Ops as mentioned in the linked thread) but it will take a while.
The short term option is training with the command line interface, available with YggDrasil Decision Forests -- the underlying C++ implementation powering TF-DF.
Check it out. A model trained there can be easily imported in TensorFlow for further evaluation.
Closing this issue since it has been stale for >1 year now. However, I want to point out that TF-DF has since added a parameter maximum_model_size_in_memory_in_bytes to limit the size of a TF-DF model in memory.
Dear TFDF developer(s), I am happy to be trying out the TFDF package. I am having some RAM issues, I am not using Colab. I first tested it out using the Penguins data as suggested in Introducing TensorFlow Decision Forests — The TensorFlow Blog 4 with no issues. I also did a little research and found out the default num_trees=300 and max_depth=16, I believe from help(tfdf.keras.RandomForestModel). Then I moved up to what for me is a “middle of the road” size-wise sparse data matrix of 50k rows and 22k columns using the same loading method as in the blog and deleting data frames as I went. I am using a 32GB RAM ubuntu 18.04 instance and I track RAM using top. I stuck with the default settings and watched it evaporate until finally “Killed”. Not unexpected as the available RAM was heading to 0 and presumably this is a safeguard. I then built a similar model using Sklearn, 300 trees & max_depth = None and RAM usage maxed out at about 5.4 GB. I then tried setting TFDF sorting_strategy=IN_NODE and was again watching the RAM increase to about 50% until I got a NameError: ‘IN_NODE’ is not defined. Regardless the memory using was already at ~16GB. A six-fold increase in RAM seems out-of-bounds so I am wondering if there’s a memory issue that hasn’t been exposed yet or if perhaps I am doing something wrong. I am using the default settings and following the loading process from the blog, so I think I am doing things correctly. Please advise. thanks, Dan711