Closed jorainer closed 5 years ago
Currently the schema_massbank_auto.yaml
is organized such as to represent a blueprint for the parser, such that its structure follows the structure of the MassBank record. See the node
entries, e.g.
- field: AC$MASS_SPECTROMETRY
rule: block
node:
- field: MS_TYPE
map: ms_level
- field: ION_MODE
map: ion_mode
- field: COLLISION_ENERGY
map: collision_energy
This is more intuitive for writing the record specification at least for the text-format records. We have to see if such a specification also works for e.g. Agilent CEF which is XML-based. I think it should work but I'm not sure.
If we turn it the other way round, we would instead specify:
metadata:
- field: ms_type
original: MS_TYPE
parent: AC$MASS_SPECTROMETRY
But then we would have to specify a dummy AC$MASS_SPECTROMETRY
somewhere since this is just a node that isn't really mapped to a field, and we have to figure out how the record has to be ordered. So because of that I think the current way of specifying it has some advantages.
However, the schema_massbank_auto.yaml
is still a first sketch (not just the MassBank specification, but also the syntax) and the fields.yaml
is still completely unused.
OK, totally fine.
But you are highlighting a possible issue. The schema_massbank_auto.yaml
is currently doing two things; it is 1) defining the structure of the record, which is used by your "function 1" (#6) and 2) defining the mapping, which is used by your "function 2". Do you think this is good, or should this be separated?
My goal was to define most of the record structure in schema, so the parser would be as free as possible from "business logic". Still any schema (containing "rules") will only work with a parser that understands these rules. It would be cool if it works out well, but maybe it won't.
just realized that myself. That's why I need to implement an importer by myself to understand what's going on and how to best achieve what I want.
Hi all,
just wondered if it would not make sense to define the schemas differently. Currently the schema_massbank_auto.yaml is defined as:
And the fields.yaml as
Wouldn't it be more intuitive (and easier later for the mapping) to have it in the form:
It's not a big thing, but I guess for new users/definitions of schemas it might be easier to just copy the fields.yaml and add an original value to it.