Closed pierreroudier closed 2 years ago
Hi Pierre, thanks for proposing :-) I think the streaming as you describe is an interesting use case. I think if you rely on streaming we for sure can support it in the package. @ThomasKnecht what is your opionion? We then have a separate function read_opus_file()
that down in the stack calls read_opus_raw()
?
Just out of interest, are you using SFTP protocol to achieve this? At the moment what I did at work is to configure the Windows measurment computer as Win OpenSSH server, so in a decent net I get 50MB/s read speed which is fine.
yea, i have a look at it. do you have an example stream connection?
maybe we create a method that checks if it is a file path or a url and dependent on that, it open a different connection
i mesn the only thing that is different is the readBin. so we could make a read_opus function instead of _file or _raw and depending on the input it streams directly from the web or it uses the file
maybe we create a method that checks if it is a file path or a url and dependent on that, it open a different connection
agree, so this would then point back to the idea of implementing different drivers, like stars sf terra etc. do :-)
If you all agree, I would go like Pierre suggested and Thomas added to it:
read_opus_raw()
read_opus()
, which has a mechanism to differentiate disk file or web streamingi can make a draft later on
Just out of interest, are you using SFTP protocol to achieve this? At the moment what I did at work is to configure the Windows measurment computer as Win OpenSSH server, so in a decent net I get 50MB/s read speed which is fine.
No, we have set up an API sitting on top of our soil observation database, where spectra (and other binary files) are stored as BLOB. We just query the spectra URIs using HTTP GET to download the BLOBs (which are raw).
If you all agree, I would go like Pierre suggested and Thomas added to it:
1. keep the first draft of `read_opus_raw()` 2. implement `read_opus()`, which has a mechanism to differentiate disk file or web streaming
Yes, I think this is a good idea. read_opus
being the one-stop shop, with slightly lower-level/more detailed methods available should the case arise (read_opus_file
, read_opus_raw
).
maybe we create a method that checks if it is a file path or a url and dependent on that, it open a different connection
It's worth noting that the PR doesn't do that, it allows to read raw
strings directly.
In my case, you still have to HTTP GET the BLOB and read it (I'm using rawConnection
).
maybe we create a method that checks if it is a file path or a url and dependent on that, it open a different connection
It's worth noting that the PR doesn't do that, it allows to read
raw
strings directly.In my case, you still have to HTTP GET the BLOB and read it (I'm using
rawConnection
).
@ThomasKnecht you mentioned we could as well implement an {rcurl} method for it. - @pierreroudier I guess your setup would apply for it right? To avoid a hard dependency, we might work with if (!require("rcurl"))
and trigger an informative error. Same applies for the output arg, i.e. if we want to go with {jsonlite} etc.
@ThomasKnecht you mentioned we could as well implement an {rcurl} method for it. - @pierreroudier I guess your setup would apply for it right? To avoid a hard dependency, we might work with
if (!require("rcurl"))
and trigger an informative error. Same applies for the output arg, i.e. if we want to go with {jsonlite} etc.
That's a good question.
I'm tempted to say that in this case, I would handle this /outside/ the {opusreader2}
package -- currently I have a dedicated package that works with our Science API.
This way it would keep the package simple, scoped, and nimble.
@ThomasKnecht you mentioned we could as well implement an {rcurl} method for it. - @pierreroudier I guess your setup would apply for it right? To avoid a hard dependency, we might work with
if (!require("rcurl"))
and trigger an informative error. Same applies for the output arg, i.e. if we want to go with {jsonlite} etc.That's a good question.
I'm tempted to say that in this case, I would handle this /outside/ the
{opusreader2}
package -- currently I have a dedicated package that works with our Science API.This way it would keep the package simple, scoped, and nimble.
I think the implementation Thomas drafted now is quite modular and simple, therefore I would like to include the new functionality 😎 I am thinking of spectral platforms, where we would like to be as flexible as possible. For example, if we put the binaries on an s3 storage, on a deducated instance, stream via rcurl and read directly is rather efficient.
@ThomasKnecht you mentioned we could as well implement an {rcurl} method for it. - @pierreroudier I guess your setup would apply for it right? To avoid a hard dependency, we might work with
if (!require("rcurl"))
and trigger an informative error. Same applies for the output arg, i.e. if we want to go with {jsonlite} etc.That's a good question.
I'm tempted to say that in this case, I would handle this /outside/ the
{opusreader2}
package -- currently I have a dedicated package that works with our Science API.This way it would keep the package simple, scoped, and nimble.
I think the implementation Thomas drafted now is quite modular and simple, therefore I would like to include the new functionality 😎 I am thinking of spectral platforms, where we would like to be as flexible as possible. For example, if we put the binaries on an s3 storage, on a deducated instance, stream via rcurl and read directly is rather efficient.a
@ThomasKnecht you mentioned we could as well implement an {rcurl} method for it. - @pierreroudier I guess your setup would apply for it right? To avoid a hard dependency, we might work with
if (!require("rcurl"))
and trigger an informative error. Same applies for the output arg, i.e. if we want to go with {jsonlite} etc.That's a good question. I'm tempted to say that in this case, I would handle this /outside/ the
{opusreader2}
package -- currently I have a dedicated package that works with our Science API. This way it would keep the package simple, scoped, and nimble.I think the implementation Thomas drafted now is quite modular and simple, therefore I would like to include the new functionality sunglasses I am thinking of spectral platforms, where we would like to be as flexible as possible. For example, if we put the binaries on an s3 storage, on a deducated instance, stream via rcurl and read directly is rather efficient.
Because we avoid dependencies, there is no extra cost. plus rcurl is really stable.
@pierreroudier we solved it with https://github.com/spectral-cockpit/opusreader2/pull/28
so now you can input raw
@pierreroudier we solved it with https://github.com/spectral-cockpit/opusreader2/pull/28
so now you can input raw
Excellent -- so should I cancel/close this PR then?
Excellent -- so should I cancel/close this PR then?
we could leave it open, to remind that we can implement a URLs in the dsn (data source name) of read_opus
This PR reproduces the function
opus_read_raw
from the originalopusreader
.It splits the
read_opus_file
function and extracts its core so that a raw stream can be passed to the function.I'm not gonna lie, that's a functionality that is relevant for my personal use case -- where we "stream" directly the content of OPUS file over the web using web services!
It does not really affect the overall structure of the package/the code, though, and could even be hidden from the namespace should you choose to.