Closed shadowk29 closed 8 years ago
I set up a new branch devel-1.0-ticket69 that we can use to test this before merging back to devel-1.0.
When you get a chance, test the fix in commit 101b449ce1aa0d1287240e6e9ba53ac2306ce2fd with your data set. You will have to set driftThreshold
and maxDriftRate
to negative values to turn off drift checking. Also, set the baseline estimation to automatic by setting meanOpenCurr
, sdOpenCurr
and slopeOpenCurr
to -1. The partition function should then update the baseline for each new chunk of data.
Not sure yet if this is unique to this branch or not since I have a test running at the moment, but mosaic currently crashes with a ValueError if the length of the data file fits perfectly into an integer number of data blocks.
Couple of bugs, I think. I may be misunderstanding how it is set up, but let me know if I have this right and I can fix them:
eventSegment._checkdrift() is not called from eventSegment._eventsegment(), so the update is not performed currently. I think _checkdrift() should be called in _eventsegment() right after
t=self.currData.popleft()
self.globalDataIndex+=1
as self._checkdrift(t).
Within _checkdrift(), after the first time is is called,
if self.meanOpenCurr == -1. or self.sdOpenCurr == -1. or self.slopeOpenCurr == -1.:
will fail because those variables were reset on the last run time _checkdrift was called. I think we can simply remove that condition?
Let me know.
I added pull request #70 with a correction to the baseline updates. There are some other bugs I am trying to track down (specifically, AbsEventStart column in my output does not match the location of events in the data file). Not clear if this is specific to this branch yet. It seems like baseline limits might be necessary, though, as the program gets bogged down detecting thousands of events during clogged states which are longer than the BlockSize.
I think I screwed up that pull request and did not push my local changes. Will fix tomorrow.
Submitted Pull request #72 to partially address the issues here.
Outstanding issues: on clogs that slightly overlap the good baseline, mosaic gets hung up thinking that there are events on every data point. This is true even for the regular mosaic approach that calculates baseline only at the start. Not clear yet what is causing this, but it will be the first thing I debug when I get back in January.
Pull request #83 should cover the issues here, pending more tests
I'll close this for now. We can reopen it if other issues arise.
Solid-state nanopores often change size over the course of a few hours of current data, making the values of baseline stats calculated at the beginning in applicable to later sections of the same run. An option that allows calculation of local baseline for each new chunk of data requested would be helpful for analysis of long solid-state nanopore runs.