Closed SeongEon-Jo closed 6 months ago
@SeongEon-Jo You'll have to include a test case with this PR. Modify the hardcoded file path to use File.createTempFile.
I'm thinking this test will fail even under the new patch.
@Test
void testGrandchild() throws Exception {
// you can use any file to test
File file = Paths.get("src/test/resources/test_file.jpeg").toFile();
try( InputStream grandChild = new SharedFileInputStream(file).newStream(0, -1).newStream(0, -1)) {
System.gc();
Assertions.assertDoesNotThrow(() ->grandChild.read());
}
}
@jmehrens I added a test case but slightly changed the code you gave because SharedFileInputStream.newStream()
returns InputStream
, not SharedInputStream
.
So I just casted it to SharedFileInputStream
in order to invoke newStream()
.
Test result was "passed".
Also, I ran the same test in junit5 like below.
@Test
void testGrandchild() throws Exception {
File sampleFile = File.createTempFile("test", "test");
try (InputStream grandChild = ((SharedFileInputStream) new SharedFileInputStream(sampleFile).newStream(0, -1)).newStream(0, -1)) {
System.gc();
Assertions.assertDoesNotThrow(() -> grandChild.read());
}
}
Result was also "passed".
I only added a case of grand child. But if you want me to add another cases, feel free to tell me about that.
Result was also "passed".
I only added a case of grand child. But if you want me to add another cases, feel free to tell me about that.
Oh I see. The reason this passes is that the un-reference parent gets garbage collected but the call to close by the finializer is a no-op because of the reference counting. All good then.
You should add your original test case too as a method to the JUnit test. You should copy one of the license headers from the other test cases.
You should add your original test case too as a method to the JUnit test. You should copy one of the license headers from the other test cases.
Added the license header and child test case.
But I want to notify that in the version before this patch, the test results were different depending on whether I invoke Assertions.assertDoesNotThrow()
of junit5 or just invoke child.read()
.
For example, in a test code I gave in https://github.com/jakartaee/mail-api/issues/694,
@Test
void test() throws Exception {
// you can use any file to test
File file = Paths.get("src/test/resources/test_file.jpeg").toFile();
InputStream slaveSharedFileIs = new SharedFileInputStream(file).newStream(0, -1);
System.gc();
Assertions.assertDoesNotThrow(() -> slaveSharedFileIs.read());
}
it failed, which means slaveShareFiles.read()
throws an IOException
and it is what I expected.
But when I tested with the code below,
@Test
void test() throws Exception {
// you can use any file to test
File file = Paths.get("src/test/resources/test_file.jpeg").toFile();
InputStream slaveSharedFileIs = new SharedFileInputStream(file).newStream(0, -1);
System.gc();
slaveSharedFileIs.read();
}
it passed, which means slaveSharedFileIs.read()
does not throw IOException
I guess this occurs because of the code scope but not sure of the exact reason.
Of course, after my patch, no matter whether I invoke Assertions.assertDoesNotThrow()
or just invoke child.read()
, all test succeeded, not throwing IOException
.
The reason I am notifying this is because I am not that sure if the test code I've added properly covers all the problematic situations.
Please let me know your opinion and suggest test cases I can add to cover them.
But I want to notify that in the version before this patch, the test results were different depending on whether I invoke
Assertions.assertDoesNotThrow()
of junit5 or just invokechild.read()
.
Put a comment on this behavior so future maintainers don't try to refactor and break the test.
Put a comment on this behavior so future maintainers don't try to refactor and break the test.
Added a comment to the test class.
Feel free to tell me if there is something to add or edit in the comment.
@SeongEon-Jo I've been looking at the source code and I think the original behavior of this class is by design. Per:
Note that when the SharedFileInputStream is closed, all streams created with the
newStream
method are also closed. This allows the creator of the SharedFileInputStream object to control access to the underlying file and ensure that it is closed when needed, to avoid leaking file descriptors. Note also that this behavior contradicts the requirements of SharedInputStream and may change in a future release.
That is why there are 2 code paths for closing and force closing. So in theory this issue could be closed as will not fix because the behavior is actually documented. If we were going to change the behavior so this didn't happen then we could instead pinning the root stream we could just reference count all streams instead of treating one stream as the root marked by a boolean. Meaning we never call forceClose or track a boolean to identify the root stream.
Other than the test case provided, how did you come across this issue in the wild? Is it the case of some abstraction returning a sub stream?
@jmehrens
As I understand, the document you mentioned is about that closing the root stream will close any derived streams created by newStream()
.
It is a sensible behavior and I totally agree with that, and that's not what I am talking about.
What I am pointing out in the issue (https://github.com/jakartaee/mail-api/issues/694) and this PR is that G.C may close the root SharedFileInputStream
unexpectedly when there is nothing to keep a reference to it.
The solution I try to give is also about making the root SharedFileInputStream
not be junked by G.C by pinning it as the way derived streams retain its reference.
I guess a way of the reference counting you suggested seems to solve the problem only after you remove its finalize()
method because if G.C invokes the root SharedFileInputStream
's finalize()
, it will make it close its RandomAccessFile
whatever the reference count is.
Actually, I also initially thought of the reference counting as a solution, but just made derived streams keep a reference to the root stream because removing or modifying finalize()
is quite burdensome 😂
Other than the test case provided, how did you come across this issue in the wild? Is it the case of some abstraction returning a sub stream?
I found it when I make a MimeMessage
with SharedFileInputStream
and extract some of the DataHandler
in it and get DataSource
and also InputStream
in it.
Let me give you a sample code to help you understand.
Firstly, make a new MimeMessage
with SharedFileInputStream
and extract some of the InputStream
of aDataSource
of a DataHandler
.
public List<InputStream> extract() {
MimeMessage message = new MimeMessage(session, new SharedFileInputStream(getFile()));
List<InputStream> results = new ArrayList<>();
foreach (mimePart of message) {
if (...) {
InputStream extractedInputStream = mimePart.getDataHandler().getDataSource().getInputStream();
results.add(extractedInputStream);
}
}
return results;
}
Note that MimeMessage
actually has a SharedFileInputStream
created by newStream()
and there's no reference to a root stream .
Secondly, invoke the extract()
and do something with the returned InputStream
, getting size for example.
public List<Integer> getSize() {
List<InputStream> extracted = extract();
return extracted.stream()
.map(is -> is.available())
.collect(Collectors.toList());
}
I found is.available()
threw IOException: Stream Closed
.
It's because G.C invoked finalize()
of a root stream somewhen between extract()
and getSize()
, closing the RandomAccessFile
.
As a result, extracted InputStream
which was also made of SharedFileInputStream
that MimeMessage
has, also became unavailable.
I also checked RandomAccessFile
is closed while running getSize()
by debugger.
I guess a way of the reference counting you suggested seems to solve the problem only after you remove its
finalize()
method because if G.C invokes the rootSharedFileInputStream
'sfinalize()
, it will make it close itsRandomAccessFile
whatever the reference count is.
I think the problem is this class is designed it such a way that it is a candidate for java.lang.ref.Reference.reachabilityFence
I would think that we don't need to pin the root stream. It just needs to be kept alive long enough such that the counter gets incremented before the root is freed. E.G:
public synchronized InputStream newStream(long start, long end) {
try {
if (in == null)
throw new RuntimeException("Stream closed");
if (start < 0)
throw new IllegalArgumentException("start < 0");
if (end == -1)
end = datalen;
return new SharedFileInputStream(sf,
this.start + start, end - start, bufsize);
} finally {
Reference.reachabilityFence(this);
}
}
However, I would have thought synchronized keyword would be enough to keep it alive. Also I noticed that the close methods are not synchronized. This probably due to to multiple locks being acquired but I wonder if that is getting us into trouble since SharedFileInputStream::close is not synchronized it is just barging ahead of newStream. Synchronizing close might fix this issue with the root being G.C. too quickly.
Given your test case it seems that we should move forward with fixing this issue. My preference would be to use the Cleaner API on the SharedFile and get rid of the finalize method on the SharedFileInputStream. However, I would like to see just the simple reference counting working before we move to Cleaner API.
@jmehrens
I did same tests (https://github.com/jakartaee/mail-api/pull/695#issuecomment-1664174805) based on the sample code you gave. (Note that tests were run with original SharedFileInputStream
before my patch.)
Firstly, added Reference.reachabilityFence(this);
at the finally
scope of newStream()
.
Secondly, added synchronized
at the close()
without adding Reference.reachabilityFence(this);
at the finally
scope of newStream()
.
Third, added both.
Test all failed.
Actually, I don't understand what synchronized
has to do with G.C. Is it just for supporting synchronizing in multi-thread?
If there is something I can refer to, please let me know. so that I can add another tests.
My preference would be to use the Cleaner API on the SharedFile and get rid of the finalize method on the SharedFileInputStream.
This seems to be one of the reasonable ways to deal with this problem because it leaves file's lifecycle to the SharedFile
that actually references to it, rather than to the SharedFileInputStream
.
I also found that just removing finalize()
of SharedFileInputStream
made it all test passed.
I think if you fix the problem in that way, a completely new work needs to be done, closing this PR.
It seems to be not that hard work. Do you want me to work on it?
Test all failed.
I think this because we also have to get rid of the concept of the root stream being able to close all streams and simply reference count. As in get rid of root/master boolean and forceClose methods. I could be wrong though.
Actually, I don't understand what
synchronized
has to do with G.C.
In this case synchronized modifier means that the current thread has locked 'this' for the whole method call. Meaning that 'this' is reachable during the whole method call. Without synchronized modifier this
can be G.C during the method call if there are no more references to this
where this
is the root stream in:
InputStream slaveSharedFileIs = new SharedFileInputStream(file).newStream(0, -1);
This is explained at the bottom of: java.lang.ref.Reference.reachabilityFence
Is it just for supporting synchronizing in multi-thread?
Usually it is to ensure that a close stream is seen as closed to all threads.
My preference would be to use the Cleaner API on the SharedFile and get rid of the finalize method on the SharedFileInputStream.
This seems to be one of the reasonable ways to deal with this problem because it leaves file's lifecycle to the
SharedFile
that actually references to it, rather than to theSharedFileInputStream
.I also found that just removing
finalize()
ofSharedFileInputStream
made it all test passed.I think if you fix the problem in that way, a completely new work needs to be done, closing this PR.
It seems to be not that hard work. Do you want me to work on it?
Let me try to get consensus on the approach.
The more I think about it the reason the finalizer exists on the SharedFileInputStream is to make the reference counting accurate. Which enables the last stream to close the native resource. Which is important if the last/root stream is strongly referenced. For sure is important on Windows due to open file locking.
I think this because we also have to get rid of the concept of the root stream being able to close all streams and simply reference count. As in get rid of root/master boolean and forceClose methods. I could be wrong though.
I see. All test passed after getting rid of master boolean and forceClose()
however I tested, adding synchronized
or not, adding Reference.reachabilityFence(this)
or not. But also need further examination whether it is actually freed from memory.
Please be aware of this result and refer to it for future works.
Let me try to get consensus on the approach.
All right. Please let me know if there's any progress and anything that I can help.
@SeongEon-Jo Apologizes for not getting back to you sooner on this PR. After looking this over and what has been learned from your previous code I think the approach should be:
I'm thinking with these changes this class should work as you expect it to and we won't need to pin the root stream.
Thanks for all your efforts on this PR and helping me gain an understanding of what is going on with this class.
@jmehrens
All sounds good in your approach.
All test passed fortunately after applying your suggestions.
Thanks for all your efforts on this PR and helping me gain an understanding of what is going on with this class.
You're welcome. I'm happy to contribute. 😎
Feel free again to let me know if you have more to add or edit.
@jmehrens
I am getting an error like below during build.
[INFO]
[INFO] --- maven-compiler-plugin:3.11.0:compile (base-compile) @ jakarta.mail-api ---
[INFO] Changes detected - recompiling the module! :dependency
[INFO] Compiling 113 source files with javac [debug release 8] to target/classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /home/jenkins/agent/workspace/mail_PR-695/api/src/main/java/jakarta/mail/util/SharedFileInputStream.java:[444,22] cannot find symbol
symbol: method reachabilityFence(jakarta.mail.util.SharedFileInputStream)
location: class java.lang.ref.Reference
[ERROR] /home/jenkins/agent/workspace/mail_PR-695/api/src/main/java/jakarta/mail/util/SharedFileInputStream.java:[488,22] cannot find symbol
symbol: method reachabilityFence(jakarta.mail.util.SharedFileInputStream)
location: class java.lang.ref.Reference
[INFO] 2 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
I surely added java.lang.ref.Reference
import to SharedFileInputStream
like below.
import java.lang.ref.Reference;
What should I do to resolve this?
@SeongEon-Jo Looks like we are targeting JDK8 still in the pom.xml. Instead of calling reachabilityFence
You can do the same with:
import java.util.Objects;
Objects.requireNonNull(this); //TODO: replace with Reference.reachabilityFence
@SeongEon-Jo Sorry for the delay on this. Codewise, I think this is all good. Just need to do a review of how it is used in the codebase before I merge it.
I think you need to update the https://github.com/jakartaee/mail-api/blob/master/doc/release/CHANGES.txt with something like:
CHANGES IN THE 2.1.3 RELEASE
----------------------------
E 695 SharedFileInputStream should comply with spec
I think you also need to put a compatibility note in: https://github.com/jakartaee/mail-api/blob/master/www/docs/COMPAT.txt for the 2.1.3 RELEASE. Something like:
- SharedFileInputStream should comply with spec
The root SharedFileInputStream no longer closes all streams created with the newStream. This behavior was not compliant with the contract specified in the SharedInputStream interface which specifies that all streams must be closed before the shared resource is closed.
I finally found in my notes the explanation on why requireNonNull works: https://mail.openjdk.org/pipermail/core-libs-dev/2018-February/051312.html
Thanks for the help! Nice work!
@jmehrens
Updated exactly same way you suggested.
Please let me know if there's anything which violates docs conventions 😂
I finally found in my notes the explanation on why requireNonNull works: https://mail.openjdk.org/pipermail/core-libs-dev/2018-February/051312.html
Got this. Thanks for the kind explanation!
@SeongEon-Jo I haven't forgotten about this! I'll do my best to get this reviewed and merged. Thank you for your patience.
resolves #694
Added reference to master
SharedFileInputStream
to slaves.By doing this, I guess it can retain reference to master
SharedFileInputStream
in order not to be junked by garbage collection until the reference to the last derived stream is gone.Feel free to comment if there are some points to add or fix.