Closed pditommaso closed 8 years ago
The general answer is yes. In particular, you'd want to obtain iterators, or a Reference and get iterators from that. The iterators are not thread safe, of course, but are designed to be used one-per-thread. So an approach of sharing an iterator across several threads will not work, but if you split a ReadCollection into several ranges and get an iterator per range, that will work.
OK. It makes sense. Thanks.
One more question: I've noticed that reads are cached in a folder in the user home. Is it possible to specify the location of the cache directory, or eventually to disable it?
Yes. See documentation on configuring VDB (the engine underneath NGS) using the sra-toolkit:
https://github.com/ncbi/sra-tools/wiki/Toolkit-Configuration
You can set the cache location via configuration. By default (i.e. in absence of any formal input from the user), it chooses $HOME/ncbi/public
for open-access data, and $HOME/ncbi/dbGaP-xxxx
where the x's indicate a particular project to which you have access.
Turning off caching altogether makes sense if you either a) never re-read data, or b) you have a relatively fast internet connection. In the latter case, you're essentially using NCBI's storage as a networked drive.
Nice, thanks. However that documentation does not mention the API. Is that possible to configure that options using the Java API?
The answer will require a little bit of background...
The NGS API is intended to be vendor neutral and supports simultaneous loading of distinct engines. The NCBI-NGS engine that plugs in underneath is just one of the possible engines that could exist.
As such, we cannot add NCBI-specific features or configuration to the API. What we can do, however, is what we are doing right now, which is to create NCBI-specific extensions to NGS. The configuration class you mention is under development exactly now, since we need it, too. We can keep you updated on progress.
By the way - are you able to comment on what you're doing with the NGS API? We're always curious to hear!
That's a good news. Yes, the idea is to integrate the NGS API in the Nextflow framework so that a pipeline script would be able to download/access reads by specifying one or more accession numbers.
You can read more at this link https://github.com/nextflow-io/nextflow/issues/89
Thanks for the info. I just posted a response there as well.
I'm closing this thread and I will open an issue related to the cache configuration options API.
Is the
ReadCollection
Java API thread safe? I would like to use thegetReadRange
method on the same collection object in different threads in order to parallelise the reads download process.