usnistgov / mosaic

A modular single-molecule analysis interface
https://pages.nist.gov/mosaic/
Other
38 stars 17 forks source link

Feature Request - local baseline detection to allow for slow drift in solid-state nanopore baseline #69

Closed shadowk29 closed 8 years ago

shadowk29 commented 8 years ago

Solid-state nanopores often change size over the course of a few hours of current data, making the values of baseline stats calculated at the beginning in applicable to later sections of the same run. An option that allows calculation of local baseline for each new chunk of data requested would be helpful for analysis of long solid-state nanopore runs.

abalijepalli commented 8 years ago

I set up a new branch devel-1.0-ticket69 that we can use to test this before merging back to devel-1.0.

abalijepalli commented 8 years ago

When you get a chance, test the fix in commit 101b449ce1aa0d1287240e6e9ba53ac2306ce2fd with your data set. You will have to set driftThreshold and maxDriftRate to negative values to turn off drift checking. Also, set the baseline estimation to automatic by setting meanOpenCurr, sdOpenCurr and slopeOpenCurr to -1. The partition function should then update the baseline for each new chunk of data.

shadowk29 commented 8 years ago

Not sure yet if this is unique to this branch or not since I have a test running at the moment, but mosaic currently crashes with a ValueError if the length of the data file fits perfectly into an integer number of data blocks.

shadowk29 commented 8 years ago

Couple of bugs, I think. I may be misunderstanding how it is set up, but let me know if I have this right and I can fix them:

eventSegment._checkdrift() is not called from eventSegment._eventsegment(), so the update is not performed currently. I think _checkdrift() should be called in _eventsegment() right after t=self.currData.popleft() self.globalDataIndex+=1

as self._checkdrift(t).

Within _checkdrift(), after the first time is is called, if self.meanOpenCurr == -1. or self.sdOpenCurr == -1. or self.slopeOpenCurr == -1.:

will fail because those variables were reset on the last run time _checkdrift was called. I think we can simply remove that condition?

Let me know.

shadowk29 commented 8 years ago

I added pull request #70 with a correction to the baseline updates. There are some other bugs I am trying to track down (specifically, AbsEventStart column in my output does not match the location of events in the data file). Not clear if this is specific to this branch yet. It seems like baseline limits might be necessary, though, as the program gets bogged down detecting thousands of events during clogged states which are longer than the BlockSize.

shadowk29 commented 8 years ago

I think I screwed up that pull request and did not push my local changes. Will fix tomorrow.

shadowk29 commented 8 years ago

Submitted Pull request #72 to partially address the issues here.

Outstanding issues: on clogs that slightly overlap the good baseline, mosaic gets hung up thinking that there are events on every data point. This is true even for the regular mosaic approach that calculates baseline only at the start. Not clear yet what is causing this, but it will be the first thing I debug when I get back in January.

shadowk29 commented 8 years ago

Pull request #83 should cover the issues here, pending more tests

abalijepalli commented 8 years ago

I'll close this for now. We can reopen it if other issues arise.