tensorflow / java

Java bindings for TensorFlow
Apache License 2.0
818 stars 203 forks source link

Reconsider Loss generic parameter #341

Open rnett opened 3 years ago

rnett commented 3 years ago

I brought this up with @JimClarke5 but wanted to get some wider comments on it.

The generic parameter for Losses doesn't seem right:

<T extends TNumber> Operand<T> call(Ops tf, Operand<? extends TNumber> labels, Operand<T> predictions, Operand<T> sampleWeights);

It binds the loss value, the predictions, and the weights to the same data type. There is no relation there: it's easy enough to imagine float-weighted cosine similarity of one hot vector predictions, which would return a float. The only necessary data type parameter I can think of is the return type, and that should be on the class.

JimClarke5 commented 3 years ago

Internally everything needs to be cast to the same type. Right now labels are cast to the class type. As long as we have a rule to cast the parameters to the same type for the calculations, then another rule for casting the result, it will work. Do you have a suggestion?

rnett commented 3 years ago

Wouldn't that type need to be set at the class level, rather than either of the inputs? I'm thinking of something like the cosine similarity of int vectors: predictions and labels will be int, but we want the output to be float. I don't think you can determine the result type from the inputs.

I'm not sure exactly what the API would look like, but maybe:

public class Loss<R extends TNumber>{
  Operand<R> call(Ops tf, Operand<? extends TNumber> labels, Operand<? extends TNumber> predictions, Operand<? extends TNumber> sampleWeights)
}

public class CosineSimilarityLoss<R extends TFloating> extends Loss<R>{
  public CosineSimilarityLoss(Class<R> resultType){
    this.resultType = resultType;
  }
  @Override
  public Operand<R> call(Ops tf, Operand<? extends TNumber> labels, Operand<? extends TNumber> predictions, Operand<? extends TNumber> sampleWeights){
    // cast using this.resultType
  }
}

I left resultType to the subclass so that they could limit the types (i.e. to <: TFloating), but you could do that in Loss or a BaseLoss (I'd prefer the second, for Kotlin interop, but either would work).

Craigacp commented 3 years ago

Given the C API requires us to cast all these things to the same type, why should we give the appearance of flexibility when there isn't any?

rnett commented 3 years ago

Because presumably they will all be cast to the result type, i.e. TFloat32 for my example, not any of the input types (i.e. TInt32). It's not so much flexibility, it's handling situations where the result datatype is larger/more precise than the input data type (i.e. inputs is int, output is float).

rnett commented 3 years ago

There's a kind of similar issue for Initializer, where if I understand the intent right, you should easily enough be able to do something like:

Ones<TFloating> ones = new Ones<>();
ones.call(tf, dims, TFloat32.class);

This isn't currently supported because a) it would return TFloating (even though it would really be a TFloat32 tensor) and b) the type of the type argument to call is Class<T> not Class<? extends T>. Is there any reason to have the type parameter on the initializer implementations? It seems like something like:

public interface Initializer<T extends TType> {
  <R extends T> Operand<R> call(Ops tf, Operand<TInt64> dims, Class<R> type);
}
public class Ones extends Initializer<TType> {
  @Override
  public <T> Operand<T> call(Ops tf, Operand<TInt64> dims, Class<T> type) {
    return tf.fill(dims, cast(tf, tf.constant(1), type));
  }
}

would work better, which is the same sort of pattern I proposed for Loss.

rnett commented 3 years ago

(I'm writing the framework Kotlin wrappers, which is why this is coming up).

Craigacp commented 3 years ago

Because presumably they will all be cast to the result type, i.e. TFloat32 for my example, not any of the input types (i.e. TInt32). It's not so much flexibility, it's handling situations where the result datatype is larger/more precise than the input data type (i.e. inputs is int, output is float).

I don't think that's quite right. If you cast an int to a float you lose precision (if you're at the upper or lower end of the int's range). Therefore having a method which accepts int parameters but internally casts them to float will silently lose precision. At least this way the user knows what's going to happen and has to do the cast themselves.

rnett commented 3 years ago

Hmm, that's true. We're still silently casting labels though, so we may want to standardize that.

rnett commented 3 years ago

For my wrappers, I'm trying to write something like:

inline fun <T: TType> Loss(block: KotlinOps.(Operand<T>) -> Operand<T>): ?

which would define a loss for a single datatype (T). This isn't possible without relying entirely on runtime casting, so I would still find it helpful to have something like my earlier example, where the class type parameter limits the method types. It would also enable losses that are only valid for certain data types, although I don't think we have any at the moment. It would be more helpful for Initializer where several are only valid for TNumber or TBool but can't express that in their types.

JimClarke5 commented 3 years ago

The reason labels are a different type is because some Loss classes use a different type naturally. For example, CategoricalCrossentropy, uses a one-hot representation for labels, which is naturally represented by a TIntegral.

=====

WRT the suggestion:

public class Loss<R extends TNumber>{
  Operand<R> call(Ops tf, Operand<? extends TNumber> labels, Operand<? extends TNumber> predictions, Operand<? extends TNumber> sampleWeights)
}

The return type would be defined in the class's <R> declaration, but what type should be used for internal calculation? Arbitrarily, use the type defined by predictions? That is what we do now, but at lease there is a consistency with the return Operand.

=====

Another issue, TBool acts more like a TIntegral within the calculations than a true boolean. Currently, TBool's would have to always be cast to at least a TIntegral for parameters. Not sure if we should handle this any different. Currently the burden is on the programmer for casting TBool. TType is too broad because of TString, which cannot be cast.

rnett commented 3 years ago

Yeah, the more I think about it the less I like the solution I proposed, you would still be able to do things in the Kotlin wrapper like Loss<TFloating>{ ... } which doesn't make any sense.

Per the example, with what Adam mentioned you'd need a generic on the class for the upper bound, but you would also still need a generic on the method for the computation type.

I remember there was talk a while back of a TNumberOrBool, did that go anywhere?

I don't have a proposal I like, Kotlin wrapper wise, other than doing something like fully separating a loss/activation that works on a single data type from a factory that produces losses for supported types, which to be clear I don't actually think is a good idea. I do think there's room to improve it some so that it's nicer from Kotlin (or Java if you want to make a custom loss), but maybe wait until you finish Layers and Model and see how they are actually used, then come back to this?

x5w46fxdx commented 3 years ago

Kotlin interface Network abstract class AbstractNetwork\ : Network where T:TType {} abstract class AbstractNetworkNumber\ : AbstractNetwork\() where T:TNumber {} abstract class AbstractNetworkTFloating\ : AbstractNetworkNumber\() where T:TFloating {} Java interface Network {} abstract class AbstractNetwork\ implements Network {} abstract class AbstractNetworkNumber\ extends AbstractNetwork\ {} abstract class AbstractNetworkTFloating\ extends AbstractNetworkNumber\ {} class XModuleTFloat32 extends AbstractNetworkTFloating\ {} class XModuleTFloat64 extends AbstractNetworkTFloating\ {} class ModuleFactory {}

Kotlin OpsEx inline fun \ AbstractNetwork\.addLayers(..) where T:TType{ } inline fun \ Ops.ex() where T:TType{ }

@rnett You should write your code using this structure.

Craigacp commented 3 years ago

Constraining the whole network to a single type isn't a great idea. For example images frequently come in in TUInt8 but the network will compute on them using TFloat32. Similarly text data can be fed in as integer id numbers.

x5w46fxdx commented 3 years ago

The use of the network needs to abstract a data service. Provide Tensor type. There are only Network interface and DataServer interface for interaction Then abstract RunnersServer as the training method, test method, and execution method. The base interface as Server There can be multiple training services at the same time. A variety of models can be trained. There can be multiple data services for one model.

inteface RunnersServer implements Server{ void run() } inteface Network implements Server {} interface DataServer implements Server { Tensor getLabelTensor(index:Int) Tensor getInputTensor(index:Int) }

@Craigacp

Craigacp commented 3 years ago

We are building up a higher level API which will have a Model interface similar to the one available in Keras. However it's a lot of work, so we're building it piece by piece.

That kind of high level server interface for deployed models can already be found in SavedModelBundle and having one for training has a lot of complexity beyond the things that you describe. It's not clear that adding such a training interface will be broadly useful.