kundajelab / genomelake

Simple and efficient access to genomic data for deep learning models.
BSD 3-Clause "New" or "Revised" License
43 stars 17 forks source link

Open the bigwig file only once #17

Closed Avsecz closed 6 years ago

Avsecz commented 6 years ago

Consistently with the FastaExtractor, the file should be opened once in the __init__ and not every time in _extract.

coveralls commented 6 years ago

Pull Request Test Coverage Report for Build 77


Changes Missing Coverage Covered Lines Changed/Added Lines %
genomelake/extractors.py 4 5 80.0%
<!-- Total: 4 5 80.0% -->
Totals Coverage Status
Change from base Build 60: -0.5%
Covered Lines: 153
Relevant Lines: 171

💛 - Coveralls
coveralls commented 6 years ago

Pull Request Test Coverage Report for Build 85


Totals Coverage Status
Change from base Build 60: 0.2%
Covered Lines: 156
Relevant Lines: 173

💛 - Coveralls
csfoo commented 6 years ago

I would check if the bigwig library being used is thread-safe; this was why I re-opened the file everytime.

On Sat, Aug 11, 2018 at 5:47 AM, Chris Probert notifications@github.com wrote:

@chrisprobert commented on this pull request.

In genomelake/extractors.py https://github.com/kundajelab/genomelake/pull/17#discussion_r209392544:

     return out

+

  • def close(self):

I'm thinking of scenarios where someone is running genomelake in a training loop, and keeps instantiating extractors, e.g. for each epoch of training a model, without explicitly calling close(). Adding the del would stop us from filling up the file descriptor table.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/genomelake/pull/17#discussion_r209392544, or mute the thread https://github.com/notifications/unsubscribe-auth/AFSeiNnbyhZU8Pk17432B6dWIZ89gHgnks5uPf-PgaJpZM4V4eBs .

Avsecz commented 6 years ago

I think that's an important point to document, but it shouldn't prevent the user to efficiently use the extractors. The same is done for the FastaExtractor and pysam.FastaFile is not threadsafe.

csfoo commented 6 years ago

Agreed completely.

On Mon, Aug 13, 2018 at 9:00 PM, Žiga Avsec notifications@github.com wrote:

I think that's an important point to document, but it shouldn't prevent the user to efficiently use the extractors. The same is done for the FastaExtractor and pysam.Fasta is not threadsafe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/genomelake/pull/17#issuecomment-412509640, or mute the thread https://github.com/notifications/unsubscribe-auth/AFSeiDzWbP3_T8IWwB1yA1Y7SyHhl_seks5uQXiCgaJpZM4V4eBs .

Avsecz commented 6 years ago

added a docstring saying the extractor is not thread-safe

Avsecz commented 6 years ago

@jisraeli I will merge this PR. Fine?

jisraeli commented 6 years ago

👍