Tensor Scope and resource management

tensorflow / java

Java bindings for TensorFlow

Apache License 2.0

813 stars 200 forks source link

Tensor Scope and resource management #184

Open rnett opened 3 years ago

rnett commented 3 years ago

cc @saudet @karllessard @Craigacp

This issue is for TensorScope and tensor resource management more generally, as discussed in #181 and the community call.

I'm envisioning usage like (or with try-with-resources):

TensorScope scope = new TensorScope();
TInt32 input = stuff;
TInt32 result = function(input).detach();
scope.close();
// result is accessible here, input and any tensors created in function() are not

A few issues I'd like comment on:

NDArrays. As mentioned in https://github.com/tensorflow/java/issues/181#issuecomment-755863642, it's possible to have a NDArray opaquely backed by a tensor. The tensor could be closed by a TensorScope, making the NDArray inaccessible in a way that probably won't make sense to users. I plan to add isNativeBuffer() and closeNativeBuffer() to NDArray, and some Javadoc comments about this, so I think it's ok as the default behavior, but I also think it would be a good idea to have TensorScope have an option to copy out NDArrays on close (i.e. to a Java buffer). Not sure how it would be implemented yet, but it should be possible. When exactly to do it is more complicated. We don't want to do it for every NDArray, because that would include every TType, but we may want to do it for non-TType NDArrays that use one of those buffers.
Threading: how much do we want to support multithreading? PointerScope uses ThreadLocal, which is necessary for the global scope stacks, but prevents running parts of a model in another thread, if that's even supported in the first place.
PointerScope (@saudet). We discussed implementing this by wrapping PointerScope, but that means that as far as I understand it, the PointerScope would pick up any other pointers, too. It seems better to re-implement the tracking ourselves, which would be necessary for things like copying out NDArrays anyways, and using TF_Tensor's reference counting. RawTensor also already uses PointerScope internally, so I think that takes care of the reference counting.
TF_Tensor's deallocator doesn't implement ReferenceCounter, which as far as I can tell will make the reference counting not work. @saudet
TF_Tensor and TFE_TensorHandle. Do I need to track TFE_TensorHandle as well?
More broadly, this waits until the end of the scope to do any cleanup, where especially for eager mode we want to remove temporary variables as soon as they are un-live. Am I correct that when using Operands in eager mode, we don't actually realize the tensors in Java and cleanup is done by TF's native side?

TensorMapper#nativeHandle also probably needs to call retainReference, depending on the semantics we want.

saudet commented 3 years ago

A few issues I'd like comment on:

NDArrays. As mentioned in #181 (comment), it's possible to have a NDArray opaquely backed by a tensor. The tensor could be closed by a TensorScope, making the NDArray inaccessible in a way that probably won't make sense to users. I plan to add isNativeBuffer() and closeNativeBuffer() to NDArray, and some Javadoc comments about this, so I think it's ok as the default behavior, but I also think it would be a good idea to have TensorScope have an option to copy out NDArrays on close (i.e. to a Java buffer). Not sure how it would be implemented yet, but it should be possible. When exactly to do it is more complicated. We don't want to do it for every NDArray, because that would include every TType, but we may want to do it for non-TType NDArrays that use one of those buffers.

If we don't want the Tensor in some context to be added to some "outer scope", we just need to open a "subscope" and keep that one independent of the outer one.

Threading: how much do we want to support multithreading? PointerScope uses ThreadLocal, which is necessary for the global scope stacks, but prevents running parts of a model in another thread, if that's even supported in the first place.

I don't think we need to worry about this since neither Python nor C++ supports that kind of usage.

PointerScope (@saudet). We discussed implementing this by wrapping PointerScope, but that means that as far as I understand it, the PointerScope would pick up any other pointers, too. It seems better to re-implement the tracking ourselves, which would be necessary for things like copying out NDArrays anyways, and using TF_Tensor's reference counting. RawTensor also already uses PointerScope internally, so I think that takes care of the reference counting.

We can specify the classes we want a PointerScope to be associated with: http://bytedeco.org/javacpp/apidocs/org/bytedeco/javacpp/PointerScope.html#PointerScope-java.lang.Class...-

TF_Tensor's deallocator doesn't implement ReferenceCounter, which as far as I can tell will make the reference counting not work. @saudet

I'm pretty sure the C API doesn't do any reference counting on its own, but I may be mistaken. Where do you see this counter? AFAIK, it relies on Python's GC to do that. We need to implement something in Java.

TF_Tensor and TFE_TensorHandle. Do I need to track TFE_TensorHandle as well?

Probably? It looks like Swift decided to go with TFE_TensorHandle only though and leave that handle TF_Tensor: https://www.tensorflow.org/swift/api_docs/Classes/TFETensorHandle Maybe we should do the same?

More broadly, this waits until the end of the scope to do any cleanup, where especially for eager mode we want to remove temporary variables as soon as they are un-live. Am I correct that when using Operands in eager mode, we don't actually realize the tensors in Java and cleanup is done by TF's native side?

If the memory pointers don't show up in the C API, it's all managed internally by TensorFlow, yeah.

TensorMapper#nativeHandle also probably needs to call retainReference, depending on the semantics we want.

Please make sure you understand what I wrote here: https://groups.google.com/a/tensorflow.org/g/jvm/c/FyXjlFmmTJI/m/NeDG-OC6AQAJ If there's anything unclear, it would be good to clarify now!

rnett commented 3 years ago

We can specify the classes we want a PointerScope to be associated with: http://bytedeco.org/javacpp/apidocs/org/bytedeco/javacpp/PointerScope.html#PointerScope-java.lang.Class...-

Yeah, but even then it will only pick up the internal TF_Tensor pointers, when I want it to get the RawTensor objects, and possibly NDArrays. Additionally, RawTensor has it's own PointerScope, and if I understand right, that means that the TF_Tensor pointer in RawTensor would only be attached to it's scope, not the outer TensorScope.

TF_Tensor's deallocator doesn't implement ReferenceCounter, which as far as I can tell will make the reference counting not work. @saudet

I'm pretty sure the C API doesn't do any reference counting on its own, but I may be mistaken. Where do you see this counter? AFAIK, it relies on Python's GC to do that. We need to implement something in Java.

I'm talking about here: https://github.com/tensorflow/java/blob/0fc54c8c5c250cd4a2536aec4e62fff90bb0c3f4/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/internal/c_api/AbstractTF_Tensor.java#L29

I missed the auto-wrapping in Pointer#deallocator, it looks like it will work fine.

TF_Tensor and TFE_TensorHandle. Do I need to track TFE_TensorHandle as well?

Probably? It looks like Swift decided to go with TFE_TensorHandle only though and leave that handle TF_Tensor: https://www.tensorflow.org/swift/api_docs/Classes/TFETensorHandle Maybe we should do the same?

I'm guessing that means that they don't have any NDArray equivalent and just use eager mode Operands for their "tensors". That's an option, but would require lots of changes to other stuff.

TensorMapper#nativeHandle also probably needs to call retainReference, depending on the semantics we want.

Please make sure you understand what I wrote here: https://groups.google.com/a/tensorflow.org/g/jvm/c/FyXjlFmmTJI/m/NeDG-OC6AQAJ If there's anything unclear, it would be good to clarify now!

If I understand right, it won't reference count the way things are now since retainReference is never called (the count is 0). But if we start calling retainReference in other places (like in the TensorScope), we'd also have to add it here.

One other question, about https://github.com/tensorflow/java/blob/0fc54c8c5c250cd4a2536aec4e62fff90bb0c3f4/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/RawTensor.java#L157-L167

It says that it takes ownership of the handle, but since it attaches it to the session's scope and detaches it from it's scope, doesn't it then not own it (and wouldn't close it on close())?

saudet commented 3 years ago

We can specify the classes we want a PointerScope to be associated with: http://bytedeco.org/javacpp/apidocs/org/bytedeco/javacpp/PointerScope.html#PointerScope-java.lang.Class...-

Yeah, but even then it will only pick up the internal TF_Tensor pointers, when I want it to get the RawTensor objects, and possibly NDArrays. Additionally, RawTensor has it's own PointerScope, and if I understand right, that means that the TF_Tensor pointer in RawTensor would only be attached to it's scope, not the outer TensorScope.

If the only resources that we're planning to manage with reference counting are native resources, and assuming those are all going to be Pointer or in this case TF_Tensor for now anyway, then the way I see it, we can do everything with PointerScope. Could you explain why RawTensor, NDArray, etc need to be treated differently? Again, this is not a paradigm that is supported by any language feature in Python, C++, Swift, etc, so unless we can come up with a convincing use case where we really need this, I don't see the point of worrying about all those corner cases. No one has ever needed those kinds of features in other languages.

We can easily attach a TF_Tensor to multiple scopes though, that's the whole point of using reference counting in the first place! See some sample usage here in the unit test: https://github.com/bytedeco/javacpp/blob/master/src/test/java/org/bytedeco/javacpp/PointerTest.java#L958

Please make sure you understand what I wrote here: https://groups.google.com/a/tensorflow.org/g/jvm/c/FyXjlFmmTJI/m/NeDG-OC6AQAJ If there's anything unclear, it would be good to clarify now!

If I understand right, it won't reference count the way things are now since retainReference is never called (the count is 0). But if we start calling retainReference in other places (like in the TensorScope), we'd also have to add it here.

I think what you're looking for is PointerScope.attach(). If a newly allocated Pointer finds a PointerScope on the stack, it will call PointerScope.attach(this) on itself and this.retainReference() gets called that way. We don't need to call it manually.

One other question, about

https://github.com/tensorflow/java/blob/0fc54c8c5c250cd4a2536aec4e62fff90bb0c3f4/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/RawTensor.java#L157-L167

It says that it takes ownership of the handle, but since it attaches it to the session's scope and detaches it from it's scope, doesn't it then not own it (and wouldn't close it on close())?

I think @karllessard did it that way to mimic the behavior of the old Java API, where close() still deallocates the memory regardless of whoever owns it. We probably want to change that... BTW, please remember though that having each Tensor own itself defeats the purpose of having "scopes" in the first place. If that's what we want to do, we just need to use AutoCloseable for everything and we're done.

rnett commented 3 years ago

If the only resources that we're planning to manage with reference counting are native resources, and assuming those are all going to be Pointer or in this case TF_Tensor for now anyway, then the way I see it, we can do everything with PointerScope.

Potentially, I haven't played around with it enough to be sure. The only reason we would need to handle NDArrays differently is because we can have NDArrays backed by tensors' native buffers, but they have the same API as one backed by a Java buffer. I'm leaning towards an API level solution there rather than special handling though, so I doubt it will be an issue.

We can easily attach a TF_Tensor to multiple scopes though, that's the whole point of using reference counting in the first place!

I get that, but it only does the first one automatically, where we want it to be attached to the RawTensor's scope (or any other PointerScope) and the first TensorScope. Doable, sure, but we'd need to maintain our own stack and manually add to the TensorScope's PointerScope in RawTensor's constructor or somewhere similar. It'll need its own stack anyways for nesting.

I think what you're looking for is PointerScope.attach(). If a newly allocated Pointer finds a PointerScope on the stack, it will call PointerScope.attach(this) on itself and this.retainReference() gets called that way. We don't need to call it manually.

I don't think this will be an issue if I handle NDArrays at the API level, but to ensure I understand it right:

Not calling anything there would cause the NDArray's pointer to be closed whenever the tensor is closed, i.e. it has no ownership of the buffer.
Calling retainReference() or attaching it to a new PointerScope in the NDArray would cause the NDArray to have ownership of it, and require a close method.

I think @karllessard did it that way to mimic the behavior of the old Java API, where close() still deallocates the memory regardless of whoever owns it. We probably want to change that...

The implementation definitely needs to change, but, I think it's good to have a way to forcibly close a tensor. TensorScope is for guaranteeing you don't have leaks, but closing earlier is fine and should be supported.

Actually, that brings up a bigger issue. Assuming I understand things right, if we leave RawTensor the way it is (using a PointerScope) and add the TF_Tensor to TensorScope's PointerScope, the TensorScope would keep the TF_Tensor alive even if the RawTensor is closed. It would have to detach itself from every scope.

I think I need to finalize TensorScope's semantics a bit more before adding PointerScope to it.

saudet commented 3 years ago

If the only resources that we're planning to manage with reference counting are native resources, and assuming those are all going to be Pointer or in this case TF_Tensor for now anyway, then the way I see it, we can do everything with PointerScope.

Potentially, I haven't played around with it enough to be sure. The only reason we would need to handle NDArrays differently is because we can have NDArrays backed by tensors' native buffers, but they have the same API as one backed by a Java buffer. I'm leaning towards an API level solution there rather than special handling though, so I doubt it will be an issue.

You mean with ByteBuffer, etc? Those can't be deallocated manually anyway... Could you clarify what you're trying to do?

We can easily attach a TF_Tensor to multiple scopes though, that's the whole point of using reference counting in the first place!

I get that, but it only does the first one automatically, where we want it to be attached to the RawTensor's scope (or any other PointerScope) and the first TensorScope. Doable, sure, but we'd need to maintain our own stack and manually add to the TensorScope's PointerScope in RawTensor's constructor or somewhere similar. It'll need its own stack anyways for nesting.

No, it can work exactly like you mention above without doing anything manually:

try (TensorScope scope = new TensorScope()) {
    TInt32 input = stuff;
    TInt32 result = function(input).detach();
}

(Other than to forward calls like detach() to PointerScope.detach(), etc.) Again, please clarify what you're trying to do.

I think what you're looking for is PointerScope.attach(). If a newly allocated Pointer finds a PointerScope on the stack, it will call PointerScope.attach(this) on itself and this.retainReference() gets called that way. We don't need to call it manually.

I don't think this will be an issue if I handle NDArrays at the API level, but to ensure I understand it right:

Not calling anything there would cause the NDArray's pointer to be closed whenever the tensor is closed, i.e. it has no ownership of the buffer.

Calling retainReference() or attaching it to a new PointerScope in the NDArray would cause the NDArray to have ownership of it, and require a close method.

Yes, if I understand correctly what you're saying, I think what you're saying is correct. :) We could have it such that NDArray is considered a "scope", but again, what's the point? That would require users to close NDArray manually, so we just need to implement AutoCloseable on it and ask them to close it manually once they are done! That's it, we're done. We don't need to do any reference counting.

I think @karllessard did it that way to mimic the behavior of the old Java API, where close() still deallocates the memory regardless of whoever owns it. We probably want to change that...

The implementation definitely needs to change, but, I think it's good to have a way to forcibly close a tensor. TensorScope is for guaranteeing you don't have leaks, but closing earlier is fine and should be supported.

Sure, that's what Pointer.deallocate() is for.

Actually, that brings up a bigger issue. Assuming I understand things right, if we leave RawTensor the way it is (using a PointerScope) and add the TF_Tensor to TensorScope's PointerScope, the TensorScope would keep the TF_Tensor alive even if the RawTensor is closed. It would have to detach itself from every scope.

Well, calling Pointer.close() just calls Pointer.releaseReference() by default, so if someone did that it would deallocate the memory if the referenceCount drops to 0, but that would be a programming error.

I think I need to finalize TensorScope's semantics a bit more before adding PointerScope to it.

Yes, I think you need to do that :)

rnett commented 3 years ago

You mean with ByteBuffer, etc? Those can't be deallocated manually anyway... Could you clarify what you're trying to do?

It's because of how tensors are mapped to NDArrays, since the type refactor TType extends both, and uses TFloat16Mapper and the like. Most NDArrays are backed by Java nio Buffers or Unsafe array access, which will be GC'd properly, but you can have NDArrays backed by native buffers (tensors), and currently there's no way to tell. I was experimenting with ways to automatically move the data to Java if the tensor is closed, but I think it's better to handle that at a API level, e.g. adding usesNativeBuffer() and copyOutNativeBuffer() methods to NDArray.

No, it can work exactly like you mention above without doing anything manually:
try (TensorScope scope = new TensorScope()) {
    TInt32 input = stuff;
    TInt32 result = function(input).detach();
}
(Other than to forward calls like detach() to PointerScope.detach(), etc.) Again, please clarify what you're trying to do.

Oh I see, you might want to change the Javadoc on PointerScope.scopeStack, it says "Pointer objects attach themselves automatically on Pointer.init to the first one they can to on the stack", while the actual code attaches to every one that's valid.

I think @karllessard did it that way to mimic the behavior of the old Java API, where close() still deallocates the memory regardless of whoever owns it. We probably want to change that...

The implementation definitely needs to change, but, I think it's good to have a way to forcibly close a tensor. TensorScope is for guaranteeing you don't have leaks, but closing earlier is fine and should be supported.

Sure, that's what Pointer.deallocate() is for.

Yeah, that's fine for manually calling it.

If I understand things right, since pointers are added to all scopes on the stack (with the right classes), a pointer is guaranteed to be valid until it's longest lived scope is closed, or deallocate() or enough releaseReference()s are called? The PointerScope is meant to represent something like an object w/ the pointer as a field?

saudet commented 3 years ago

You mean with ByteBuffer, etc? Those can't be deallocated manually anyway... Could you clarify what you're trying to do?

It's because of how tensors are mapped to NDArrays, since the type refactor TType extends both, and uses TFloat16Mapper and the like. Most NDArrays are backed by Java nio Buffers or Unsafe array access, which will be GC'd properly, but you can have NDArrays backed by native buffers (tensors), and currently there's no way to tell. I was experimenting with ways to automatically move the data to Java if the tensor is closed, but I think it's better to handle that at a API level, e.g. adding usesNativeBuffer() and copyOutNativeBuffer() methods to NDArray.

I think just a copy constructor would be more appropriate for this use case.

BTW, no, direct NIO buffers do not get properly garbage collected. They may hang around until the application either terminates correctly, or runs out of memory. It's not specified. That's on purpose to make them thread-safe. Panama is looking into ways to "fix" this, but it still looks like a mess to me.

No, it can work exactly like you mention above without doing anything manually:
try (TensorScope scope = new TensorScope()) {
    TInt32 input = stuff;
    TInt32 result = function(input).detach();
}
(Other than to forward calls like detach() to PointerScope.detach(), etc.) Again, please clarify what you're trying to do.
Oh I see, you might want to change the Javadoc on PointerScope.scopeStack, it says "Pointer objects attach themselves automatically on Pointer.init to the first one they can to on the stack", while the actual code attaches to every one that's valid.

No, it doesn't, the code correctly attaches only to the innermost PointerScope that can accept the Pointer: https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/Pointer.java#L716

I'm trying to follow as closely as possible languages features from C++, Python, Swift, etc, because that's how developers expect things to work, and because JavaCPP is about bridging Java with C++. However, the Java language doesn't provide this kind of feature, and to reproduce the C++ experience exactly, we'd have to splatter PointerScope everywhere. Letting users specify which subclasses exactly they are interested in helps avoid interference when using multiple native libraries together, even when one of them isn't using PointerScope.

If I understand things right, since pointers are added to all scopes on the stack (with the right classes), a pointer is guaranteed to be valid until it's longest lived scope is closed, or deallocate() or enough releaseReference()s are called? The PointerScope is meant to represent something like an object w/ the pointer as a field?

PointerScope is meant to represent a "scope", which is usually something that is associated with a block of code found between a pair of curly brackets { and } in C++: the definition of a class, a function, etc, a for/while loop, if/else statements, and any other code blocks.

rnett commented 3 years ago

BTW, no, direct NIO buffers do not get properly garbage collected. They may hang around until the application either terminates correctly, or runs out of memory. It's not specified. That's on purpose to make them thread-safe. Panama is looking into ways to "fix" this, but it still looks like a mess to me.

Ah, that's annoying. @karllessard how are we handling that currently? I didn't see any close methods for NDArray. Is something planned?

No, it doesn't, the code correctly attaches only to the innermost PointerScope that can accept the Pointer: https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/Pointer.java#L716

Ah right, I missed the break.

PointerScope is meant to represent a "scope", which is usually something that is associated with a block of code found between a pair of curly brackets { and } in C++: the definition of a class, a function, etc, a for/while loop, if/else statements, and any other code blocks.

Yeah, ok, since it doesn't attach to all scopes that makes sense.

karllessard commented 3 years ago

Ah, that's annoying. @karllessard how are we handling that currently? I didn't see any close methods for NDArray. Is something planned?

The NdArray library does not allocate native memory by itself. The only way to use native memory with an NdArray is to pass a preallocated buffer to the library, which is what TensorFlow is doing. The ownership of that memory remains outside of the library, so are the closing mechanisms (in Tensor, for TensorFlow).