Closed ococrook closed 2 years ago
I'll have a look tomorrow, but it would be useful if you could provide a small reproducible example.
Small comment - better to use filterMsLevel(hoip_apo_1_0_test, 2)
rather than doing it manually with hoip_apo_1_0_test[msLevel(hoip_apo_1_0_test) == 2]
because the filter function might be optimised for a dedicated backend.
Thanks Laurent, good pint about the filterMsLevel
- I missed that. I can send you that test file - it's a small .mzML for the purposes of package testing. Though I might have already resolved the issue I'd still be keen to hear your thoughts!
Yes, send it over.
I am seeing similar issue (though for out2
, not out1
), but will need more time to explore/understand.
I assume that the missing c
in out1 <- filterRt(out, rt = (0,2))
is a typo, right?
yes, sorry typo!
Also, should out2 <- filterMzRange(out, c(300,400))
not be out2 <- filterMzRange(out2, c(300,400))
?
should probably have named these something more sensible to avoid typos
Here's a more detailed description of the issue:
> library(Spectra)
> library(magrittr)
> rm(list = ls())
> ## make sure the file doesn't already exist
> unlink("hoip_apo_1_0_test.h5")
> ## create data
> out <- Spectra("./hoip_apo_1_0_test.mzML",
+ source = MsBackendMzR(),
+ backend = MsBackendHdf5Peaks(),
+ hdf5path = ".") %>%
+ filterMsLevel(2)
+ > file.size("./hoip_apo_1_0_test.h5") ## 31908861
[1] 31908861
> out@processingQueueVariables ## character(0)
character(0)
Works as expected - file isn't modified, no processing queue
> out1 <- filterRt(out, rt = c(0,2)) ## first spectrum only
> length(out1)
[1] 1
> file.size("~/tmp/hoip_apo_1_0_test.h5") ## 31908861
[1] 31908861
> out1@processingQueueVariables ## character(0)
character(0)
> out1@backend@modCount
[1] 0
Things work, the file isn't modified (range is filtered on the fly) but msLevel
is added to the processing queue?
But should modCount
not be 1?
> range(mz(out1))
[,1]
[1,] 260.019
[2,] 1500.454
> out1 <- filterMzRange(out1, c(700,850))
> range(mz(out1)) ## works
[,1]
[1,] 700.0046
[2,] 849.7397
> out1@processingQueueVariables ## msLevel - ok
[1] "msLevel"
> out1@backend@modCount ## should this not be 1?
[1] 0
> file.size("~/tmp/hoip_apo_1_0_test.h5") ## 31908861
[1] 31908861
Apparently work, and file is modified, but msLevel still in processing queue. File has changed indeed, and mz can't be accessed anymore.
> applyProcessing(out1) ## apparently works
MSn data (Spectra) with 1 spectra in a MsBackendHdf5Peaks backend:
msLevel rtime scanIndex
<integer> <numeric> <integer>
1 2 1.61 1
... 33 more variables/columns.
file(s):
hoip_apo_1_0_test.h5
Processing:
Switch backend from MsBackendMzR to MsBackendHdf5Peaks [Fri Apr 16 11:30:53 2021]
Filter: select MS level(s) 2 [Fri Apr 16 11:30:53 2021]
Filter: select retention time [0..2] on MS level(s) 2 [Fri Apr 16 11:31:58 2021]
...2 more processings. Use 'processingLog' to list all.
> file.size("~/tmp/hoip_apo_1_0_test.h5") ## 23772271
[1] 23772271
> out1@backend@modCount
[1] 0
> out1@processingQueueVariables ## still msLevel - ok
[1] "msLevel"
> range(mz(out1)) ## error
Error in .h5_read_peaks(fls, scanIndex(object), object@modCount) :
The data in the hdf5 files associated with this object appear to have changed! Please see the Notes section in ?MsBackend for more information.
@jorainer - could you have a look?
Update: the msLevel
is correct, I believe, as it indicates on which MS levels to operate.
Thanks Laurent!
Hi, thanks again for the help. Though I'm still not really getting how to use the HDF5 backend in a meaningful way. I'm clearly misunderstanding something, so any help would be appreciated.
I have my potentially large Spectra object, loaded in as follows:
Now I want to look at the stuff in the ms2 scans:
I realise this is the point where something will go wrong.
out
is still linked to the.h5
file and so I'm going to mess something up if I try and filter/process object. E.g. I want to look at the first scan and then the second scan do something elseAll fine an no problem, until:
Also not fine:
So currently a pipeline where we search for peptides using known rt and mz ranges are not going to work in this way. My main question is how should this be done? It doesn't seem to make sense to switch backends before filtering because then we just load the object into memory, defeating the purpose.