Open tischi opened 2 years ago
Here is something to read: https://www.baeldung.com/java-download-file
@axtimwalde do you know the most performant why to completely load a txt file from an URL into memory?
The InputStream
does not yet contain all the downloaded data but can deliver it at request. I haven't done a performance evaluation. I believe the most significant difference between the various approaches is whether you have to load the entire file or only some parts of it via random access. This is pretty comprehensive and includes loading from URLs https://www.baeldung.com/reading-file-in-java
The InputStream does not yet contain all the downloaded data but can deliver it at request
@axtimwalde This is interesting, because I think http requests can have a significant overhead independent of the amount of data transfer.
For example here in your code: https://github.com/saalfeldlab/n5-google-cloud/blob/master/src/main/java/org/janelia/saalfeldlab/n5/googlecloud/N5GoogleCloudStorageReader.java#L206
I would be worried that this code currently entails two http requests (one in line 206 and another one in line 207), just for reading a small text file. Downloading all the information in one go (if possible) might be more performant, what do you think?
I could not find a method that does it "in one go". There seems to be always first the step of opening the InputStream
.
I tried to benchmark, reading a not so small file:
long start;
final String tableURL = "https://raw.githubusercontent.com/mobie/platybrowser-project/main/data/1.0.1/tables/sbem-6dpf-1-whole-segmented-cells/default.tsv";
start = System.currentTimeMillis();
URL url = new URL(tableURL);
final InputStream inputStream = url.openStream();
System.out.println("Open Table InputStream [ms]: " + ( System.currentTimeMillis() - start ));
start = System.currentTimeMillis();
// using apache.commons.io
final String s = IOUtils.toString(inputStream, StandardCharsets.UTF_8.name());
System.out.println("Read InputStream into String [ms]: " + ( System.currentTimeMillis() - start ));
and I am getting:
Open Table InputStream [ms]: 766
Read InputStream into String [ms]: 2703
More things to explore: https://stackoverflow.com/questions/309424/how-do-i-read-convert-an-inputstream-into-a-string-in-java
Current we are using this code from
java.net
:I wonder what that actually does? Specifically, does the
InputStream
(a) already contain all the downloaded data or (b) not?