Closed GoogleCodeExporter closed 8 years ago
Comment from Sean Seymour, 07 May 2008 18:29:
Hi all,
I think having a standard/aligned way to indicate where cleavages are allowed or
forbidden is a great idea. Just don't make it required content - we don't do it
like
that in some search modes. Alternatively, we could expand the syntax, but I
think it
would be much easier not to and we'll push our settings out as Paragon-specific
CV.
Sean
and Pierre-Alain:
Hi all,
same might be true for Phenyx, where rules might be as complex as regular
expressions
allow them to be...
Original comment by dcre...@gmail.com
on 17 Jun 2008 at 4:43
Example from Phenyx:
<cleavEnzymes>
<oneCleavEnzyme name="Trypsin_(KR_noP)" owner="default">
<site>
<cleavSite>KR</cleavSite>
<adjacentSite>^P</adjacentSite>
<terminus>C</terminus>
</site>
<CTermGain>OH</CTermGain>
<NTermGain>H</NTermGain>
</oneCleavEnzyme>
and using regex:
<oneCleavEnzyme name="Trypsin_regexp">
<siteRegexp><![CDATA[(?<=[KR])(?=[^P])]]></siteRegexp>
<terminus>C</terminus>
<CTermGain>OH</CTermGain>
<NTermGain>H</NTermGain>
</oneCleavEnzyme>
Original comment by dcre...@gmail.com
on 17 Jun 2008 at 4:47
[deleted comment]
Dual enzyme sxample from Mascot (not XML):
Title:LysC+AspN
Cleavage[0]:K
Restrict[0]:P
Cterm[0]
Cleavage[1]:DB
Nterm[1]
Independent:1
Mascot semi trypsin:
Title:semiTrypsin
Cleavage[0]:KR
Restrict[0]:P
Cterm[0]
SemiSpecific:1
Original comment by dcre...@gmail.com
on 17 Jun 2008 at 4:52
Discussion on TPP:
http://groups.google.com/group/spctools-discuss/browse_thread/thread/d31ff28280f
0e46e/7f240fdc6a9fd1a8?lnk=gst&q=This+looks+like+a+great#7f240fdc6a9fd1a8
"This looks like a great grant proposal for more robust enzyme handling."
i.e. we are not alone in thinking that we can solve this by a simple regex or
two...
Original comment by dcre...@gmail.com
on 17 Jun 2008 at 4:57
I propose something simple and rather similar to the first Phenyx case:
<Enzyme name="Trypsin" semiSpecific="0" missedCleavages="2">
<site>
<cleaveSite>KR</cleaveSite>
<noCleaveSite>P</noCleaveSite>
<terminus>C</terminus>
</site>
<CTermGain>OH</CTermGain>
<NTermGain>H</NTermGain>
</Enzyme>
<Enzyme name="LysC+AspN" semiSpecific="0" missedCleavages="1" independent="1">
<site>
<cleaveSite>K</cleaveSite>
<noCleaveSite>P</noCleaveSite>
<terminus>C</terminus>
</site>
<site>
<cleaveSite>DB</cleaveSite>
<noCleaveSite></noCleaveSite>
<terminus>N</terminus>
</site>
<CTermGain>OH</CTermGain>
<NTermGain>H</NTermGain>
</Enzyme>
However, <site> shouldn't be required and we should be able to use CV instead
for
cases that don't follow these simple rules.
Original comment by dcre...@gmail.com
on 17 Jun 2008 at 5:12
probably just a nomenclature thing, but I've always felt the <cleaveSite>
<noCleaveSite> combined with <terminus> is a bit confusing. The naming scheme
used by biochemists is the Schechter and Berger notation, which always has
cleavage at the peptide bond as the central reference point - this makes
the question of "either side" irrelevant.
Schechter and Berger looks like this:
(non-prime side)...Nterm-P3 -P2 -P1 -|-P1P-P2P-P3P-C-term...(prime side)
...Nterm-Aaa-Aaa-Aaa-|-Aaa-Aaa-Aaa-C-term...
so the cut is always between P1 and P1' (prime) at what is referred to as the
scissile bond, so if we used something like:
<P1cleave>KR</P1cleave>
<P1Pnoncleave>P<P1Pnoncleave>
for trypsin.
Then this allows us higher flexibility and one less element to worry about,
plus it matches up to the biochemists expectation.
There may well be good reasons why the search engine developers don't do it
this way, but thought I'd put this out as a suggestion
-Simon-
Original comment by i.am.sim...@gmail.com
on 19 Jun 2008 at 12:14
Sorry for the long delay before replying.
Simon, one problem with this approach is that we would even need more tags this
way?
<P1cleave>
<P1Pcleave>
<P2cleave>
<P2Pcleave>
<P1nocleave>
<P1Pnocleave>
<P2nocleave>
<P2Pnocleave>
etc.
Also, as you say, it isn't the way search engines generally express it, so
might be a
little 'foreign' and off putting?
I'm becoming keener on the regular expression approach because it does (almost)
all
we want. I'll put two alternatives in the next two separate comments, so it is
easier
to refer to them.
David
Original comment by dcre...@gmail.com
on 23 Jul 2008 at 4:16
The suggestion from Pierre-Alain in comment #2 is to use Perl regular
expressions,
which may not be so clear to many people. Particularly since these are "Extended
Patterns" in perl regex...
For Trypsin, for example, it is
(?<=[KR])(?=[^P])
The ?<= is a "zero-width positive look-behind assertion", and the [] means one
of
this character set. So, this rule is to look behind for a K or R
?= is a zero-width positive look-ahead assertion, and [^P] means any character
that
is not P.
http://perldoc.perl.org/perlre.html
An example of a few lines of perl:
$protein = "ABCDKPEFGHIJKLMNOPQRSTUVWXYZ";
@peptides = split(/(?<=[KR])(?=[^P])/, $protein);
print join "\n", @peptides;
gives:
ABCDKPEFGHIJK
LMNOPQR
STUVWXYZ
So, this option would be:
<SpectrumIdentificationProtocol ...
<Enzyme>
<Rules><![CDATA[(?<=[KR])(?=[^P])]]></Rules>
<optional CV>
</Enzyme>
These expressions are very powerful and can do pretty much anything that we
want.
btw, is obviously optional as to whether you use 'CDATA' or not - it does make
it
easier for humans to read because there's no need to escape the < and &.
Everything
inside a CDATA section is ignored by the parser.
A CDATA section starts with "<![CDATA[" and ends with "]]>":
Is this all too obscure for most people?
Original comment by dcre...@gmail.com
on 23 Jul 2008 at 4:18
A less 'perly' way to do this would be to use what I suggested in comment #6,
except
that we could use (simple!) regular expressions for the cleaveSite and
noCleaveSite
elements.
Original comment by dcre...@gmail.com
on 23 Jul 2008 at 4:21
In response to David's comment #8, actually, the notation I propose does not
require all these extra lines as suggested. It just requires the two I
originally
gave (for trypsin at least). But my main gripe with the original suggestion to
use
<CleaveSite> and <noCleaveSite> was that it doesn't tell you where the cleave
actually is! The P1-P1' notation does - its fixed. The same criticism could be
levelled at the regex approach - we would still need to explicitly define where
the
cleavage is in some way. That could easily be done of course, so perhaps someone
could think of a way to do it?
I'm generally for the use of regex by the way, but we do need to ensure we
specify where the cleavage is in an intuitive way
-Simon-
Also, if a biochemist has a to generate a new
Original comment by i.am.sim...@gmail.com
on 25 Jul 2008 at 2:50
Simon, Sorry, I must be missing something here then. How would you specify Asp-N
(cleaves at DB, Nterm)?
And how would you specify Caspase (Cterm, 3 possible sequence patterns: DEVD or
DQTD
or ELPD)?
David
Original comment by dcre...@gmail.com
on 29 Jul 2008 at 9:55
In response to David.
Asp-N would simply be:
<P1Pcleave>DB</P1Pcleave>
Caspases would require a few more lines, and coping with the concept of "or"
means introducing some new notation. Regexs would be more elegant here I
suppose. But it can still be done.
<P4cleave>D</P4cleave>
<P3cleave>E</P3cleave>
<P2cleave>V</P2cleave>
<P1cleave>D</P1cleave>
the above covers the first example. Is there really a single Caspase that
cleaves DEVD or DQTD or ELPD explicitly ? I'm not sure there is, and its
an ongoing debate in the literature to define a lot of caspase specificities
isn't it?
So why do I keep going on about this? Its largely because this notation
parallels the biochemistry. It might not be high on the agenda for designing
a data model for exchange for some people, but I think it should be, especially
as the pre-existing one is widely used in all the literature and can be
found in all the standard textbooks.
-Simon-
Original comment by i.am.sim...@gmail.com
on 29 Jul 2008 at 1:30
So my list of available element names (in the schema) in #8 is pretty much
correct?
> The same criticism could be levelled at the regex approach -
> we would still need to explicitly define where the cleavage is in some way.
If you look (very!) carefully at #9, the regex format does describe this. The
problem
with this approach is (I think) that it is too obscure. If it's not obvious to
you,
it's not going to be obvious to most people.
I still don't have a real preference, but am becoming keener on your
suggestion. If
we can model multiple enzymes:
> If more than one enzyme, they can be applied to separate aliquots which
> are then mixed, or they can be applied 'together'. (If separate aliquots,
> then a peptide cannot be cleaved at one terminus by one enzyme, and the
> other by a different enzyme)
How about something like (for Trypsin and Asp-N applied in separate aliquots):
<enzymes independent="1" missedCleavages="2" semiSpecific="0" minDistance="4">
<enzyme name="Trypsin">
<P1cleave>KR</P1cleave>
<P1Pnoncleave>P<P1Pnoncleave>
</enzyme>
<enzyme name="Asp-N">
<P1Pcleave>DB</P1Pcleave>
</enzyme>
</enzymes>
then this could also be used for the (dubious) Caspase case.
<enzymes independent="0" missedCleavages="1" semiSpecific="0" minDistance="4">
<enzyme name="Caspase1">
<P4cleave>D</P4cleave>
<P3cleave>E</P3cleave>
<P2cleave>V</P2cleave>
<P1cleave>D</P1cleave>
</enzyme>
<enzyme name="Caspase2">
<P4cleave>D</P4cleave>
<P3cleave>Q</P3cleave>
<P2cleave>D</P2cleave>
<P1cleave>T</P1cleave>
</enzyme>
<enzyme name="Caspase3">
<P4cleave>E</P4cleave>
<P3cleave>L</P3cleave>
<P2cleave>P</P2cleave>
<P1cleave>D</P1cleave>
</enzyme>
</enzymes>
David
Original comment by dcre...@gmail.com
on 29 Jul 2008 at 2:22
Examples for suggestion #10
<Enzymes name="LysC+AspN" semiSpecific="0" missedCleavages="1" independent="1"
minDistance="4">
<enzyme>
<cleaveSite>K</cleaveSite>
<noCleaveSite>P</noCleaveSite>
<terminus>C</terminus>
</enzyme>
<enzyme>
<cleaveSite>[DB]</cleaveSite>
<noCleaveSite></noCleaveSite>
<terminus>N</terminus>
</enzyme>
<CTermGain>OH</CTermGain>
<NTermGain>H</NTermGain>
</Enzymes>
<Enzymes name="Caspase" semiSpecific="0" missedCleavages="1" independent="1"
minDistance="4">
<enzyme>
<cleaveSite>DEVD|DQTD|ELPD</cleaveSite>
<noCleaveSite></noCleaveSite>
<terminus>C</terminus>
</enzyme>
<CTermGain>OH</CTermGain>
<NTermGain>H</NTermGain>
</Enzymes>
Original comment by dcre...@gmail.com
on 29 Jul 2008 at 9:03
Additional possibility (just as memo from previous discussions):
Describe the search engine parameter "enzyme" using CV terms, e.g.:
<AdditionalSearchParams>
<pf:cvParam accession="PSI:0000XYZ" name="Paragon:DefaultEnzyme" cvRef="PSI"/>
...
</AdditionalSearchParams>
or
<Enzymes>
<pf:cvParam accession="PSI:0000XYZ" name="Paragon:DefaultEnzyme" cvRef="PSI"/>
...
</Enzymes>
Original comment by eisena...@googlemail.com
on 31 Jul 2008 at 9:06
Original comment by eisena...@googlemail.com
on 31 Jul 2008 at 3:02
Suggestion: create CV terms for common cases with the regular expression defined
within the term?
Original comment by delag...@gmail.com
on 31 Jul 2008 at 3:26
Angel will generate some examples for OBO terms w/ regex
Original comment by delag...@gmail.com
on 31 Jul 2008 at 3:27
Just out of interest - I tried the example extended regex in Java (from comment
9
above), it works just as easily in Java as it does in Perl:
String[] peptides = "ABCDKPEFGHIJKLMNOPQRSTUVWXYZ".split("(?<=[KR])(?=[^P])");
for (String peptide : peptides){
System.out.println("peptide = " + peptide);
}
Output:
peptide = ABCDKPEFGHIJK
peptide = LMNOPQR
peptide = STUVWXYZ
Any other language examples?
Original comment by philip.j...@gmail.com
on 31 Jul 2008 at 5:12
for SEQUEST every possibility is okay which states
- Offset
- Sites (e.g. "KR" for Trypsin)
- No-sites (e.g. P for Elastase)
Caspase is not possible.
Original comment by eisena...@googlemail.com
on 11 Sep 2008 at 1:59
to sum it up, agreed was in a TeleCon in August:
1) Have the possibility to state a regular expression
2) Have CV terms for the most important enzymes with a pre-defined regexp
Original comment by eisena...@googlemail.com
on 11 Sep 2008 at 3:01
possible XML:
<cleavageEnzymes>
<!-- Trypsin cutting cterm of K and R: -->
<oneCleavageEnzyme identifier="Trypsin" CTermGain="OH" NTermGain="H">
<cleavageEnzymeCV accession="PSI-PI:000456" name="Trypsin" cvRef="PSI-PI"/>
</oneCleavageEnzyme>
<!-- Cleavage C and Nterm of D, and trypsin cleavage at cterm of K and R -->
<oneCleavageEnzyme identifier="ChemDigest_and_Trypsin" CTermGain="" NTermGain="">
<siteRegexp><![CDATA[(?<=[DKR])|(?=[D])]]></siteRegexp>
<cleavageEnzymeCV accession="PSI-PI:000456" name="ChemDigest_and_Trypsin"
cvRef="PSI-PI"/>
</oneCleavageEnzyme>
<!-- Caspase (3 sequence patterns) -->
<oneCleavageEnzyme identifier="Caspase">
<siteRegexp><![CDATA[(?<=(?:DEVD|DQTD|ELPD))]]></siteRegexp>
<cleavageEnzymeCV accession="PSI-PI:000567" name="Caspase" cvRef="PSI-PI"/>
</oneCleavageEnzyme>
</cleavageEnzymes>
Agree?
Original comment by eisena...@googlemail.com
on 11 Sep 2008 at 3:02
[deleted comment]
Added a proposal to the schema in the svn.
More CV terms to be added!
Original comment by eisena...@googlemail.com
on 11 Sep 2008 at 6:04
Martin, looks good but I Think we agreed to use the names <Enzymes> and
<Enzyme>,
following the format in, for example #15
<Enzymes independent="1">
<Enzyme missedCleavages="2" semiSpecific="0" minDistance="4" identifier="XXX"
CTermGain="SH" NTermGain="H6">
. . .
</Enzyme>
<Enzyme>
</Enzyme>
</Enzymes>
The 'independent' attribute needs to be in <Enzymes>, but all the other
attributes
are probably better off in the individual <Enzyme> element as you have done.
For multiple enzymes, if 'independent' is 0, then I suspect that
missedCleavages,
semiSpecific, and minDistance would generally be the same for each enzyme, but
they
wouldn't need to be. If independent is 1, then all the attributes could be
different.
Original comment by dcre...@gmail.com
on 16 Sep 2008 at 2:36
For the OBO terms:
[Term]
id: PI:00232
name: peptide cleavage enzyme
def: "A general term to represent peptide cleavage enzymes. Cleavage rules are
specified using PCRE version 7.4 compliant regular expressions." [ref:ref]
is_a: PI:00000 ! protein informatics cv
[Term]
id: PI:00232
name: Trypsin
def: "Trypsin enzyme. Cleaves at Lysine and Arginine (Arg [R]) but not when
either is
followed by Proline (Pro [P]) at the C terminus."
is_a: PI:00232 ! peptide cleavage enzyme
regex: "(?<=[KR])(?!P)"
etc, etc. ...
Here are the regexes I came up with (note the notes for some questions I had).
Name Cleave_at
Trypsin (?<=[KR])(?!P)
Arg-C R(?!P)
Asp-N (?<=[:alpha:])(?=[BD]) # N-terminus cleavages require prefix AA?
Asp-N_ambic (?<=[:alpha:])(?=[DE]) # see above
Chymotrypsin (?<=[FYWL])(?!P)
CNBr (?<=M)
Formic_acid ((?<=D))|((?=D)) # note is this either/or or does it excise the
Asp(D)
completely from the sequence?
Lys-C (?<=K)(?!P)
Lys-C/P (?<=K)
PepsinA (?<=[FL])
Tryp-CNBr (?<=[KRM])(?!P)
TrypChymo (?<=[FYWLKR])(?!P)
Trypsin/P (?<=[KR])
V8-DE (?<=[BDEZ])(?!P)
V8-E (?<=[EZ])(?!P)
CNBr+Trypsin (?<=M)|(?<=[KR])(?!P)
KR (?<=P)
Original comment by delag...@gmail.com
on 2 Oct 2008 at 6:50
Looks good.
I presume that (?!P) is the same as (?=[^P]), but maybe is clearer to use the
same
syntax for one residue as for multiple residues?
I think I've corrected these properly:
Asp-N (?=[BD])
Asp-N_ambic (?=[DE])
And I'm pretty sure that your Formic_acid is correct. (I tested by using a perl
script for the regext and comparing with examples in the Mascot configuration
editor).
The Arg-C doesn't seem to work (removes the 'R'), so should be
(?<=R)(?!P)
I'm not so sure about having multiple enzymes such as CNBr+Trypsin in the CV.
You've
specified two options:
CNBr+Trypsin (?<=M)|(?<=[KR])(?!P)
Tryp-CNBr (?<=[KRM])(?!P)
But I'm not convinced that either work 100% properly for the two cases:
- both applied to the same aliquot.
- applied to separate aliquots and these are then mixed (i.e. both terminii will be
Tryptic or both CNBr,
Since we have a mechanism for mixed enzymes in the schema, we should probably
use
that and remove the mixed ones?
I've added an enzyme section for a mixed enzyme (CNBr+Trypsin) to the
Mascot_MSMS_example.axml file in the examples directory.
At the moment, for a mixed enzyme there is no place for the name chosen from the
search form drop down list for the enzyme. Likewise, if in any search engine
someone
chose to call Trypsin, say "Bovine Trypsin", there's no place for this name as
we
should just give the accession for Trypsin?
In the schema, we could restrict CTermGain and NTermGain to what we would
expect in
chemical formulae? [A-Z][a-z][0..9][ ] to stop someone entering a decimal
number?
I've put the regex plus the CV, don't know if that is what is intended.
Original comment by dcre...@gmail.com
on 3 Oct 2008 at 9:02
>I presume that (?!P) is the same as (?=[^P]), but maybe is clearer to use the
same
syntax for one residue as for multiple residues?
This is a matter of style. I have a preference to use the PCRE specification for
negating look-ahead and look-behind assertions, which are (?!...) and (?<! ...)
repsectively. Also I tend to steer towards the most succinct regex, since this
is
clearer and easier to understand for me. The character class negation seems
like you
are putting the negation in the wrong place and has the potential for double
negatives (?![^P]).
Also for character classes, I also tend toward only having a single character
when
this is the case, as in (?!P) instead of (?![P]). The compiled regex parse tree
is
different for these, even tho the result should be the same.
So I propose two notes on style:
1) use the PCRE supplied negation syntax for look-ahead and look-behind
assertions
2) use the most compact representation possible for a regex.
Original comment by delag...@gmail.com
on 3 Oct 2008 at 12:28
Thanks for the clarification and I'll happily agree to the style notes.
If you agree to not including multiple enzymes in the list, then can you
confirm that
we have:
Name Cleave_at
Trypsin (?<=[KR])(?!P)
Arg-C (?<=R)(?!P)
Asp-N (?=[BD])
Asp-N_ambic (?=[DE])
Chymotrypsin (?<=[FYWL])(?!P)
CNBr (?<=M)
Formic_acid ((?<=D))|((?=D))
Lys-C (?<=K)(?!P)
Lys-C/P (?<=K)
PepsinA (?<=[FL])
TrypChymo (?<=[FYWLKR])(?!P)
Trypsin/P (?<=[KR])
V8-DE (?<=[BDEZ])(?!P)
V8-E (?<=[EZ])(?!P)
The only other one that we are lacking is a way to describe 'No enzyme'. A
regex:
None (?<=[A-Z])
is only meaningful if we a very large number of missed cleavages. In the current
schema, then Enzymes element is optional, but if you have it, then the Enzyme
element(s) within it are required. So, no enzyme could be specified by just
ommiting
the Enzymes section, but I'd rather have something explicity say that there was
no
enzyme specificity. Any ideas?
Also, any comments on:
At the moment, for a mixed enzyme there is no place for the name chosen from the
search form drop down list for the enzyme. Likewise, if in any search engine
someone
chose to call Trypsin, say "Bovine Trypsin", there's no place for this name as
we
should just give the accession for Trypsin?
In the schema, we could restrict CTermGain and NTermGain to what we would
expect in
chemical formulae? [A-Z][a-z][0..9][ ] to stop someone entering a decimal
number?
Original comment by dcre...@gmail.com
on 3 Oct 2008 at 3:44
The "no enzyme" is a bit of a conundrum , I admit. If we must have it, then that
regex is as good as any. I think that omission of Enzyme is more true to the
semantics of the experimental protocol, but I can see how it could complicate
matters. If we choose to use "No enzyme" then I vote we make Enzyme a mandatory
element and default the value to "No enzyme". Perhaps this is a case where an
attribute of Enzyme can suffice, as opposed to a CV term.... thoughts anyone?
For the multiple enzymes, I thought I put in my last reply that I agreed we do
not
combine enzymes in the CV, but leave it up to the schema to define the
combinations.
I guess I didn't. My bad.
For the search engine parameter issue, I think this is userParam territory.
Last, for C/NTermGain I think your suggestion is a good one.
Original comment by delag...@gmail.com
on 8 Oct 2008 at 6:13
as agreed in TeleCon 9th of October:
1) changed cardinality of <cvParam> child of <Enzyme> to: one to many (to allow
synonyms)
2) restricted CTermGain and NTermGain to "[A-Za-z0-9 ] (basic letters of a
chemical
formula) (can be refined later)
Original comment by eisena...@googlemail.com
on 9 Oct 2008 at 4:29
Continuing with the CV, here are legal OBO definitions that for the most part
do not
sho up in OBO-edit.
[Term]
id: PI:00242
name: peptide cleavage enzyme
def: "A general term to represent peptide cleavage enzymes. Cleavage rules are
specified using PCRE version 7.4 compliant regular expressions" [ref:ref]
is_a: PI:00000 ! protein informatics cv
[Term]
id: PI:00243
name: Trypsin
def: "Trypsin enzyme. Cleaves at Lysine and Arginine (Arg [R]) but not when
either is
followed by Proline (Pro [P]) at the C terminus" [ref:ref]
is_a: PI:00242 ! peptide cleavage enzyme
property_value: cleavage_rule "(?<=[KR])(?\\\!P)" xsd:string
[Instance]
id: PI:00244
name: C-terminal
comment: C Terminal
instance_of: PI:00047 ! cleavage: sense
[Instance]
id: PI:00245
name: N-terminal
comment: C Terminal
instance_of: PI:00047 ! cleavage: sense
[Typedef]
id: cleavage_rule
name: cleavage_rule
domain: OBO:TERM
range: xsd:string ! xsd:string
definition: "Cleavage rule."
# end OBO file
Specifically, the instances and the property_value of the "cleavage_rule"
Typedef. I
am at a loss as to how to continue. Do we restrict our CV to OBO-edit's
capabilities?
Or just define the CV using "best-practices".
On that note, it seems that terms PI:00046 PI:00050 seem to richly specify
enzymes an
make the use of regular expressions moot (e.g we can choose to go uber-verbose
and
not put in terms for the major enzymes, thus avoid regular expressions
altogether and
force definition of enzyme to use all of the terms from PI:00046-50 when
outputting
an experiment.)
Original comment by delag...@gmail.com
on 15 Oct 2008 at 6:25
Note: I am sure that the above OBO examples have a few syntax mistakes, since I
could
not test it out in OBO-edit.
Original comment by delag...@gmail.com
on 15 Oct 2008 at 7:36
The OBO edit does not yet (...) support the property_value on terms/classes, I
did
ask for it.
Meaningwhile a temporary solution is to use the following syntax (editable and
visible on the OBOedit and other OBOviewers)
xref: value-type:{string,int,xsd} "regular expression"
Original comment by joecoppo...@gmail.com
on 17 Oct 2008 at 3:53
What does the triple backslash do in: "(?<=[KR])(?\\\!P)"
Also, it's probably a good idea to spell out "Perl-compatible regular
expressions
(PCRE)" because I'm a computer programmer and I didn't know what PCRE meant even
though I know how to use Perl regex. :)
Original comment by matthew....@vanderbilt.edu
on 23 Oct 2008 at 4:20
Two possible variants for encoding regular expressions
for the default enzymes into the OBO file:
1) "xref" and 2) "has_a" relationship.
[Term]
id: PI:00251
name: Trypsin
xref_analog: regexp:(?<=[KR\])(?\!P)
is_a: PI:00045 ! cleavage agent name
relationship: has_a PI:00176 ! (?<=[KR])(?!P)
For the 1st variant, the regular expression is only a string.
For the 2nd variant, the regular expression is itself a
term (PI:00176) and child of a "regular expression" term.
Both methods have disadvantages:
In OBOEdit the xref gives a warning, because it contains non-URI characters.
In OBOEdit the has_a relationship is not shown in the tree view, but only in
the Parent Plugin (see screenshot attached).
Which do we prefer?
Original comment by eisena...@googlemail.com
on 7 Nov 2008 at 2:03
TeleCon 12th Nov:
We decided to use the has_a relationship, because its more formal.
Martin: change has_a to has_regular_expression and delete the xrefs.
Original comment by eisena...@googlemail.com
on 12 Nov 2008 at 4:11
[deleted comment]
Original comment by dcre...@gmail.com
on 7 Dec 2008 at 4:37
Original issue reported on code.google.com by
dcre...@gmail.com
on 17 Jun 2008 at 4:40