Open jastram opened 10 years ago
There were some ideas here https://github.com/openETCS/toolchain/issues/191
Ok, I will first give it a try right here in the issue since this is still work in progress. Hence, I invite you to comment on anything that seems weird to you. Sorry for so much text...
Update, Oct 17, 2014: Added formal specification of the tracestring and amended example section to reflect the new spec. Update, Oct 18, 2014: Added Open Issues
Note 1: In the following I will use the term "requirement" for anything that will become traceable. You may substitute "element" if that name is more convenient for you.
Note 2: Everything in [square brackets] is not yet implemented.
Generate a hierarchical tree of all traceworthy artifacts in each chapter of subset-026. Each artifact shall be uniquely addressable via a tracestring.
I hope not to have forgotten anything essential...
[*]
= bulleted item, ... This identifier is followed by the running number of the current artifact in square brackets. If there is no number then *
is inserted.Note: This covers the current implementation. I.e. everything described above which was not wrapped in square brackets.
Below is a possible version of a lexer + parser written in ANTLRv4.
lexer grammar tracestringLexer;
// generic tokens
Delimiter : '.';
EOL : [\r\n]+ -> skip;
// List tokens in DEFAULT_MODE
fragment NoNumber : '*';
fragment Number : [1-9] | NumberGe10 ;
fragment NumberNot1 : [2-9] | NumberGe10 ;
fragment NumberGe10 : [1-9][0-9]+;
fragment LowerCaseCharacter : [a-z];
fragment Character : [A-Z] | LowerCaseCharacter;
fragment AlphaNumCharacter : (Character | Number);
fragment BracketedNumber : '[' Number ']';
String : AlphaNumCharacter+;
BulletedListID : '[*]' BracketedNumber;
ParagraphID : '[' NumberNot1 ']';
Table : '[t]' TableTraceString;
Figure : '[f]' FigureTraceString;
fragment FloatingEntityNumber : Number LowerCaseCharacter?;
fragment Caption : 'C';
mode TableMode;
TableTraceString : TableID InnerTable?;
fragment TableID : (FloatingEntityNumber | NoNumber);
fragment InnerTable : Delimiter (
RowID (Delimiter (ColumnID | String))?
| ConditionID
| Caption
);
fragment RowID : '[r]' BracketedNumber;
fragment ColumnID : '[c]' BracketedNumber;
fragment ConditionID : '[C]' BracketedNumber;
mode FigureMode;
FigureTraceString : FigureID InnerFigure?;
fragment FigureID : FloatingEntityNumber;
fragment InnerFigure : Delimiter Caption;
parser grammar tracestringParser;
options { tokenVocab=tracestringLexer; }
entireString : baseList (Delimiter subList)* (Delimiter floatingEntity)?;
baseList : baseListID (Delimiter baseListID)* paragraphID?;
subList : subListID (Delimiter subListID)* paragraphID?;
baseListID : String;
subListID : String | BulletedListID;
paragraphID : ParagraphID;
floatingEntity : table | figure;
table : Table;
figure : Figure;
Input: 1.2.3.4[5].6.[*][7][8].9.[*][1].[t]*.[r][10].Data
Note 1: The numberTexts in the following screenshots do not correspond to the original numberTexts in subset-026. The correct reference is given in the headings.
Note 2: Prefixes present in the numberTexts may sometimes be omitted because the corresponding levels are missing in the shown excerpts (see Rule 7 of the section on how to compute a tracestring for why this is).
will become
1
1.[*][1]
1.[*][2]
1.[f][21a]
1.[f][21a].C
will become:
1
2
2.1
2.2
2.2.1
2.3
will become:
A3
A3.1
A3.1[2]
A3.1[2].[t]*
A3.1[2].[t]*.[r][2]
A3.1[2].[t]*.[r][2].Data
A3.1[2].[t]*.[r][2].Value
A3.1[2].[t]*.[r][2].Name
A3.1[2].[t]*.[r][3]
A3.1[2].[t]*.[r][3].Data
A3.1[2].[t]*.[r][3].Value
A3.1[2].[t]*.[r][3].Name
A3.1[2].[t]*.[r][4]
A3.1[2].[t]*.[r][4].Data
A3.1[2].[t]*.[r][4].Value
A3.1[2].[t]*.[r][4].Name
A3.1[2].[t]*.[r][5]
A3.1[2].[t]*.[r][5].Data
A3.1[2].[t]*.[r][5].Value
A3.1[2].[t]*.[r][5].Name
A3.1[2].[t]*.[r][6]
A3.1[2].[t]*.[r][6].Data
A3.1[2].[t]*.[r][6].Value
A3.1[2].[t]*.[r][6].Name
@morido - thanks for taking the time for this. Looks great. I have just a few comments:
@jastram
Tables - As I mentioned, the ReqIF standard does support tables. [...] If you plan to use it after all, let me know [...].
I do not think I need this.
Currently I do the following: Create one "fake parent" requirement (the "table requirement") which has all rows (again "fake requirements") as its leaves. Those rows then give shelter to all columns (actual requirements). The rows are only placeholders (no metadata or other meaningful content besides backwards tracing information) whereas the table holds a rich text version of the entire table as kind of a "visual helper".
The "DOORS-Table" approach, on the other hand, just shows you a table and all the actual requirements are right in there.
From my perspective my current implementation better suits our special needs. Mainly because:
See the example below (which corresponds to the same table shown in example 3 of my original posting). It shows non-traced cells (the header-row; item 3 above), and the blue boxes (item 4).
Media - ReqIF has information on how embedded content should be represented. This is a little hidden, you find it on page 57, entitled: "2. Inclusion of objects that are external to the exchange XML document in the requirements authoring tool". The only valid content ignoring this are .png images. This could be a useful approach to representing formulas and similar stuff, but would create additional work for you.
As I have already told you orally I currently only export the preview information of all the OLE-data. For the subset-026 that usually means we end up with either image/x-emf
or image/x-wmf
data. Theoretically I could also dump the raw OLE BLOBs somewhere, but as you said that involves some additional work since I have to traverse the internal Word filesystem to find the correct offset within the original .doc
where this data is stored.
Unless someone desperately needs this, I would postpone any such attempts (effectively we would end up with a plethora of proprietary file formats which are only readable if the tools used to create them are available).
Collisions - I don't see this as a big issue: I would simply perform a check after parsing the document if there are duplicates. If there are, aborting is probably the best course of action.
Since I create the requirement tree on-the-fly while parsing the input document, the current way of handling this is simply to throw an exception (see example below) if a requirement with an unambiguous identifier is about to be added to the tree (basically there is some simple class backed by a HashMap which does a lookup on existing entries every time a requirement is inserted). Aborting is of course a failsafe action here. However, that means my tool will not produce any meaningful output (i.e. it renders itself pretty much useless). So if I will come across a situation which triggers this I will see if there is anything smart (other than rewriting the input document) I can do about it.
Quick example which would currently make the tool throw the exception:
Raw numberText | Generated tracestring |
---|---|
1. |
1 |
1.1.1 |
1.1 |
1.1.2 |
1.2 |
1.2 |
1.2 |
One more thing regarding 2. Inclusion of objects that are external to the exchange XML document in the requirements authoring tool in the ReqIF standard:
How am I supposed to store captions in there? I guess not in the "alternative text"?
Currently I simply export an image / table and attach a caption
-tag (resp. figcaption
-tag) to it, as one would do for an ordinary website. This "compound object" then becomes the rich text of a table- / figure-requirement and the caption itself is also stored in a separate child, again (in case anyone ever wants to trace that or the plain text is needed for any subsequent [NLP]-analysis).
@morido
Regarding tables and objects - I am perfectly fine the way you're handling this, I just wanted to document that this is in the standard.
Regarding collisions: aborting is in my opinion the right choice.
With regard to captions: the standard does not provide any recommendation for this. As embedded objects/images are just XHTML, you can surround it with further XHTML to add the caption - of course this will result in a human-readable, not a machine-readable caption. Your approach makes sense, from my point of view.
@morido A few remarks from the user's perspective:
@UweSteinkeFromSiemens
Hi Uwe,
The requirement identifiers must have delimiters at the beginning and at the end (pre- and postfix), that eases automatic detection as requirements IDs without ambiguity.
Would you be ok with adding those delimiters somewhere further downstream in the toolchain? Rationale (from my side):
\0
in a c-style char-array when you always know the string's length - superfluous...)The identifiier generation algorithm should be robust against document modifications: Identifiers already assigned in a previous document version, should be preserved, if the document was extended or shortened.
I believe that is infeasible. The current implementation of the tracestring generation relies only on that single document you throw at it. Hence, it does not have any historical data. And I would like to keep it that way, since it allows a 1:1 mapping (even manually) between tracestrings and the original document.
Revisioning should rather be handled further downstream inside ProR (i.e. by using a diff between two reqif files). Also I would urge the ERA to keep their numbering consistent at least across minor-releases of their documents.
Requirement identifiers are often used in discussions and conversations as references. The proposed identifiers seem somewhat unpronounceable. And they are a challenge for the human eye, if you're working with them in documents. Therefore, a more human-friendly representation would be useful.
Do you have any proposals? Otherwise I would weigh absence of ambiguity over clarity.
I suppose in average conversations you rarely talk about the second paragraph of the third bullet item inside "1.2.3", do you? At most you might mention "1.2.3" - and that's a rather simple identifier which will also be known by my tool (it's just way broader than a specific paragraph).
The requirements text - that the requirement id identifies - should be made available for automatized grabbing too.
Of course they will be. I just do not plan to expose any API to directly communicate with my tool but would rather point you at the resulting reqif from which you may then grab whatever you need.
@morido Please provide a short documentation on the requirements ID's definition in the documentation wiki here https://github.com/openETCS/toolchain/wiki/User-Documentation#TODO_Requirements_naming
@jastram @cecilebraun Do you want me to duplicate information? Each one of you proposed different documents where my requirement ID definition should be included.
For the time being I just propose it here (feel free to copy it to the final destination if there are no further comments):
Note: This only covers the case of paragraphs which are part of a list. If you need to trace other artifacts (which are always children of such list paragraphs) please look into the detailed explanation in #437.
Take the following example:
Suppose we want to trace the fifth paragraph in the above example i.e
• End of mission is performed
3.5.3.7
. Set traceString to this number.If the establishment...
). Hence, we do not append anything. In the second iteration there are two such paragraphs (The on-board shall...
+ If this request is not...
). Hence, the second one will receive an [2]
appendix.[*][n]
(with n being the running number of that bullet starting at 1). Prefix this new level with a dot (.
) and append it to the traceString.
a)
is the identifier of one such sublist item. The trailing brace will be removed. The bullet points form another (less significant) sublist.This will result in the following requirement ID:
3.5.3.7.a[2].[*][2]
@morido - definitely no duplicates! Please use @cecilebraun's location.
@morido: Referring to your remarks to my input above: As a tool vendor it's essential to understand the user perspective. From there, it's irrelevant, how a tool performs internally and what it is able to achieve. So, if the tool is not able to solve the problem automatically, a human-assisted combination of tooling and manual intervention should be thought ahead and provided in the chain. Therefore, it would be great to elaborate the terms "should be handled further downstream inside ProR" into concrete processing steps.
@UweSteinkeFromSiemens
Hi Uwe,
Therefore, it would be great to elaborate the terms "should be handled further downstream inside ProR" into concrete processing steps.
I suppose you are referring to the revisioning issue.
Short answer: This is outside of the scope of my tool, therefore I do not care. All I do is to convert one *.doc
-file into one *.reqif
-file. Hence, I can only process one revision at a time.
Long answer: ProR Essentials includes ReqIF Diff. This could be the way to go if you want to compute the delta between different baselines of the subset026. DOORS (or other *.reqif
-capable RM-tools) might also be of help here, as this is one of their core strengths.
Generally it should not not be required to have requirement IDs stay consistent across baselines (at least as long as they are not consistent in the input *.doc
-files). IMHO this only leads to (massive) confusion. Instead should the requirements of different baselines, which share common properties (i.e. which are "equal to a certain degree"), be linked. And that link should then be subject to manual checking. -- But this is only my personal opinion. You may disagree.
@morido - Please document what the format will be here:
https://github.com/openETCS/toolchain/wiki/Process-Documentation
This is needed for @UweSteinkeFromSiemens