Open stanstrup opened 5 years ago
Hi, let's start with a few pointers: Here is the header file and the C++ structure where the information is stored in pwiz: https://github.com/sneumann/mzR/blob/beb109476546d58a903eacd3d263e07f94d35a58/src/pwiz/data/msdata/MSData.hpp#L108 Here is the XML parsing of that element in pwiz: https://github.com/sneumann/mzR/blob/beb109476546d58a903eacd3d263e07f94d35a58/src/pwiz/data/msdata/IO.cpp#L568 and - most relevant in this context - is where mzR is parsing this back out: https://github.com/sneumann/mzR/blob/753bf97e5eb20854156740ba5a06f3ac00754c96/src/RcppPwiz.cpp#L140
The Sample
is a ParamType
as declared here:
https://github.com/sneumann/mzR/blob/beb109476546d58a903eacd3d263e07f94d35a58/src/pwiz/data/common/ParamTypes.hpp#L244
so I expect the <userParam>
will get read by pwiz, and could be extracted
via userParam("Job Code")
and then getting ->value()
in https://github.com/sneumann/mzR/blob/beb109476546d58a903eacd3d263e07f94d35a58/src/pwiz/data/common/ParamTypes.hpp#L279
Currently, there is no code to do that in mzR. It might look similar to
https://github.com/sneumann/mzR/blob/753bf97e5eb20854156740ba5a06f3ac00754c96/src/RcppPwiz.cpp#L229
Yours,
Steffen
Alternatively, this could be done in R with XML
or xml2
- that my quick and dirty hack when a CV param isn't returned by default in mzR
.
Yeah. I was thinking the same. Do you have a trick for only reading the header of the file and still getting valid XML?
No trick in my hat, I'm afraid.
But even if inefficient, we could have a function that basically does some XPath retrieval. Plus manpage with examples. Yours, Steffen
I made a little bit more challenging examples to make the solution more robust.
<sampleList count="2">
<sample id="org_filename.raw" name="Important sample">
<userParam name="Job Code" value="Some project"/>
<userParam name="Other thing" value="Other value"/>
</sample>
<sample id="org_filename2.raw" name="Important sample2">
<userParam name="Job Code" value="Some project2"/>
<userParam name="Other thing" value="Other value"/>
</sample>
</sampleList>
I can get what I want with:
library(xml2)
library(dplyr)
library(purrr)
data <- read_xml(file)
data %>%
xml_child("d1:mzML/d1:sampleList") %>%
xml_find_all("d1:sample") %>%
map(xml_attr,"name") %>%
unlist()
[1] "Important sample" "Important sample2"
data %>%
xml_child("d1:mzML/d1:sampleList") %>%
xml_find_all("d1:sample") %>%
map(xml_child,"d1:userParam[@name='Job Code']") %>%
map(xml_attr,"value") %>%
unlist()
[1] "Some project" "Some project2"
It is not as slow as I had thought. It takes just 0.2 sec. I guess it would still be nice to have some generic way to access the complete metadata, though.
Hello,
I was wondering if there is a way to extract metadata in a more direct way or extract more details info.
As a test I added the following to a file:
sampleInfo(mz)
returns"Important sampleorg_filename.raw"
, so it seemsid
andname
was concatenation without a separator.So 1) is there a way to access these fields individually? 2) possible to add a separator? 3) Wouldn't the more natural order also be id+name and not name+id? 4) Then, what about the
userParam
? Is there a way to access that? 5) Is it possible to inject metadata (e.g.sampleInfo(mz) <- list(sample="something else")
, or withwriteMSData
)? I guess I could loose more than I gain though since not all metadata is transfered (https://github.com/sneumann/mzR/issues/159).Related to actually writing the info I need with Proteowizard: https://github.com/ProteoWizard/pwiz/issues/568#issue-454695861