Closed hkwang closed 7 months ago
@hkwang , DM me with the location of the input files and careless script.
This is a bug related to the behavior of tf.gather
and that affects models trained on CPU using harmonic deconvolution and the double-wilson prior. It is related to the behavior of tf.gather
when indices are out of bound. In the log_prob calculation, the dw prior uses gather to locate samples from the "parent" of each node. it uses an array, self.reflids
, to cache the indices for this lookup. if the node has no parent or a particular reflection is observed in the child but not the parent, this array has the value -1. in tf.gather
, indices are always positive and zero-indexed. so, -1 is technically not a valid index. However, tf.gather
has different behavior on CPU and GPU. On GPU, gathering with index -1 just returns a 0 which is the desired outcome anyway. On CPU, tf
tries to validate the indices and will raise an error leading to a crash.
I get the following error when running a double-wilson careless job, which ran on careless 0.3.8 in the past, and now does not run on careless 0.4.1. That script also runs fine when I removed the double-wilson flags: