Closed ronaldtse closed 2 years ago
The model in LutaML UML:
(TODO: Add this to relaton-models)
// "Specs_GSM+3G" table
class Spec {
type: SpecificationType
specNumber: String
published: Boolean
title: String
workingGroupformer: String
workingGroupPrime: String
workingGroupOther: String
rapporteurId: Number
remarks: String
radioTechnology: RadioTechnology
isCommonImsSpec: Boolean
isInternal: Boolean
withdrawn:: Boolean
creationDate: DateTime
updateDate: DateTime
titleVerifiedDate: DateTime
url: URL
}
// "Releases" table
class Release {
code: String
description: String
shortDescription: String
version2G: String
version3G: String
isDefunct: Boolean
remarks: String
wpmCode2G: String
wpmCode3G: String
freezeMeeting: String
freezeStage1Meeting: String
freezeStage2Meeting: String
freezeStage3Meeting: String
closeMeeting: String
projectStart: DateTime
projectEnd: DateTime
previousRelease: Release
}
// "Specs_GSM+3G_release-info" table
class SpecRelease {
specId: Spec
releaseId: Release
specReleaseId: Number
remarks: String
withdrawn: Boolean
creationDate: DateTime
updateDate: DateTime
freezeMeeting: String
stoppedAtMeeting: String
}
enum SpecificationType {
TR {
definition {
Technical Report
}
}
TS {
definition {
Technical Specification
}
}
}
enum RadioTechnology {
2G
3G
LTE
5G
}
@ronaldtse
Data instance sample: http://xml2rfc.tools.ietf.org/public/rfc/bibxml-3gpp-new/reference.3GPP.55.236.xml
The sample is:
<reference anchor="3GPP.55.236">
<front>
<title>
Specification of A8_V MILENAGE Algorithm: An example algorithm for the key generation function A8_V
</title>
<author>
<organization>3GPP</organization>
</author>
<date year="2012" month="September" day="27"/>
</front>
<seriesInfo name="3GPP TS" value="55.236 11.0.0"/>
<format type="HTML" target="http://www.3gpp.org/ftp/Specs/html-info/55236.htm"/>
</reference>
<seriesInfo>
is deprecated in the <reference>
context https://datatracker.ietf.org/doc/html/rfc7991#section-2.40 Should we move it into element <front>
?<format>
is deprecated https://datatracker.ietf.org/doc/html/rfc7991#section-3.3 Should handle it?<seriesInfo>
: move to <front>
, yes<format>
: https://datatracker.ietf.org/doc/html/rfc7991#section-2.40.3 We should move <format>
into <reference>
element's target
attribute.The model in LutaML UML:
(TODO: Add this to relaton-models)
@ronaldtse I'll implement these classes in the relaton-3gpp gem but we need a grammar to generate RelatonXML files. Ping @opoudjis
UPD I think we don't need SpecRelease class. The "Specs_GSM+3G_release-info" table needed for many-to-many relations. We only need to know what Releases related to a Spec. So the SpecRelease's attributes could be moved to Release class. Suppose the Spec class should inherit form RelatonBib::BibliographicItem, so we can reuse it's attributes:
type: SpecificationType
we have a doctype
for thisspecNumber
there is a docnumber
published
, withdrawn
could we use status for it?title
it is in RelatonBib::BibliographicItem alreadycreation_date
and update_date
seem to be AccessDB's default fields. I don't think they are related to the documents created
and updated
dates.titleVerifiedDate
I believe it should be some type of datetypeurl
there is url in RelatonBib::BibliographicItem alreadyUPD I think we don't need SpecRelease class
Good insight, this may indeed be a joining table. I agree that a Spec and the Release classes should both be citable RelatonBib::BibliographicItem
items.
The only attribute I can't find is stoppedAtMeeting
. Where does this go?
The only attribute I can't find is
stoppedAtMeeting
. Where does this go?
@ronaldtse it seems we need to move remarks
, withdrawn
, freezeMeeting
, and stoppedAtMeeting
to the Release
, and add release: Release[0..*]
to the Spec
.
The overlap between Spec and BibItem is of course unacceptable, and Andrej was right to point it out. If the Spec and the Release are both citable BibItems, they are related through a relation, of type derivedFrom
(I think; if not, complements
).
That means that we need to indicate whether a bibitem is a spec or a release; I'm introducing those as docsubtypes.
Andrej, I'm trying to follow in this what you've already modelled.
... I continue to be annoyed at how poor a match these Lutaml classes are for Bibitem. This is a bunch of randomness.
class Spec {
type: SpecificationType => bibitem/item/doctype
specNumber: String => bibitem/docnumber
published: Boolean => bibitem/status/stage = 'published'
title: String => bibitem/title
workingGroupformer: String => bibitem/ext/editorialgroup/technical-committee[@type = 'former']
workingGroupPrime: String => bibitem/ext/editorialgroup/technical-committee[@type = 'prime']
workingGroupOther: String => bibitem/ext/editorialgroup/technical-committee[@type = 'other']
rapporteurId: Number => bibitem/docidentifier[@type = 'rapporteurId']
remarks: String => bibitem/note
withdrawn:: Boolean => bibitem/status/stage = 'withdrawn'
creationDate: DateTime => bibitem/date[@type = 'created'] ; if the date is unrelated to the document, as Andrej believes, it should not be recorded at all
updateDate: DateTime => bibitem/date[@type = 'updated'] ; if the date is unrelated to the document, as Andrej believes, it should not be recorded at all
titleVerifiedDate: DateTime => bibitem/date[@type = 'confirmed'] ; that is my best guess anyway
url: URL => bibitem/uri
radioTechnology: RadioTechnology : goes to ext
isCommonImsSpec: Boolean : goes to ext
isInternal: Boolean : goes to ext
}
The following are contained within bibitem/relation[@type = 'derivedFrom']/bibitem/ext/release
// "Releases" table
class Release {
code: String => bibitem/relation[@type = 'derivedFrom']/bibitem/docidentifier
remarks: String => bibitem/relation[@type = 'derivedFrom']/bibitem/note
description: String => bibitem/relation[@type = 'derivedFrom']/bibitem/abstract
shortDescription: String => bibitem/relation[@type = 'derivedFrom']/bibitem/note[@type = 'shortDescription']
previousRelease: Release => bibitem/relation[@type = 'derivedFrom']/bibitem/relation[@type = 'successorOf']/bibitem
version2G: String
version3G: String
isDefunct: Boolean
wpmCode2G: String
wpmCode3G: String
freezeMeeting: String
freezeStage1Meeting: String
freezeStage2Meeting: String
freezeStage3Meeting: String
closeMeeting: String
projectStart: DateTime
projectEnd: DateTime
}
and the grammar covering this in extensions:
include "isodoc.rnc" {
start = iso-standard
DocumentType = "TR" | "TS"
DocumentSubtype = "spec" | "release"
BibDataExtensionType =
doctype, docsubtype?, editorialgroup, ics*, radiotechnology?, common-ims-spec?, internal?, release?
}
RadioTechnologyType = "2G" | "3G" | "LTE" | "5G"
radiotechnology = element radiotechnology { RadioTechnologyType }
common-ims-spec = element common-ims-spec { xsd:boolean }
internal = element internal { xsd:boolean }
release = element release {
element version2G { text },
element version3G { text },
element defunct { xsd:boolean },
element wpm-code-3G { text },
element wpm-code-3G { text },
element freeze-meeting { text },
element freeze-stage1-meeting { text },
element freeze-stage2-meeting { text },
element freeze-stage3-meeting { text },
element close-meeting { text },
element project-start { xsd:date },
element project-end { xsd:date }
}
Have already attached to metanorma-model-iso
I did some detailed investigation in the database.
Spec
can be included in multiple Release
sRelease
will contain multiple Spec
s.The remarks
column in SpecReleases contains very specific information to a Spec within a Release. i.e. SpecRelease is not a joining table, it contains unique information that applies to a Spec in a Release.
e.g.
This also means that a Spec at a different Release is actually a different document. e.g. the Spec 45.902 in Rel-6 and Rel-7 are differently versioned documents.
This means the model works like this instead:
Thus, SpecRelease actually represents a document! Let's call it a SpecDocument.
A "Spec" is pretty much like a SpecProject that produces multiple SpecDocuments.
I would suggest we make these changes:
I'm still not fully sure where the "Version" concept can be extracted from. You can see that on this page you can have multiple document versions within one Release: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1080
The
remarks
column in SpecReleases contains very specific information to a Spec within a Release. i.e. SpecRelease is not a joining table, it contains unique information that applies to a Spec in a Release.
SpecReleases contains information but it also is a joint table. I.E. one Spec can have multiple Releases and one Release can have multiple Specs and they have. 2331 of 3918 Specs have more than 1 Release relation and 35 of 37 Releases have more than 1 Spec relation:
database['Specs_GSM+3G'].select { |s| database['Specs_GSM+3G_release-info'].select { |sr| s[:Number] == sr[:Spec] }.size > 1 }.size
=> 2331
database['Releases'].select { |r| database['Specs_GSM+3G_release-info'].select { |sr| sr[:Release] == r[:Release_code] }.size > 1 }.size
=> 35
The data is compact as a normalized relation DB but if we denormalize the data we will get a tremendous amount of Spec -> Release combinations. In the DB there are 3918 Specs. If we create a document from each Spec -> Release relation we will have 3918 + 17217 = 21135 documents:
database['Specs_GSM+3G'].map { |s| database['Specs_GSM+3G_release-info'].select { |r| r[:Spec] == s[:Number] }.size }.sum
=> 17217
I would suggest we make these changes:
- A SpecDocument links to a Release (e.g. "Rel-9") and a Spec (e.g. "TS 45.902"). We can cite it as "3GPP TS 45.902:Rel-9"
- A Spec (e.g. "TS 45.902") can contain multiple SpecDocuments (e.g. "TS 45.902:Rel-7", "TS 45.902:Rel-8", "TS 45.902:Rel-9")
@opoudjis proposes to use bibitem/relation
for Releases. One Spec (bibtem) can have multiple Releases (bibitem/relation
s). Each relation has a description that possible to use for remarks
.
So in this case we can use a reference like "TS 45.902" to get Spec with all releases. If we need to get Spec with one Release cited as "TS 45.902:Rel-7" we can get "TS 45.902" and remove all Releases except "Rel-7".
What do you say Ronald?
I'm still not fully sure where the "Version" concept can be extracted from. You can see that on this page you can have multiple document versions within one Release:
There are tables with a word version in names:
"2003-03-04_work-plan_web-export-version",
"2001-10-04_version-value-to-character-map",
"2003-01-22_latest-version-ETSI-published_step5_table",
"2003-01-22_latest-version-ETSI-published_step5-1-R99-table",
"2003-01-22_latest-version-ETSI-published_step5-1-Rel4_table",
"2003-01-22_latest-version-ETSI-published_step5-1-Rel5-table",
"2003-01-22_latest-version-ETSI-published_step5-1-Rel6-table",
"2003-02-06_table03-versions_table",
"2003-04-10_webexp04_release-and-version-details_table",
"2003-04-10_webexp12_latest-versions-all-releases_table"
I'll investigate them tomorrow.
In the DB there are 3918 Specs. If we create a document from each Spec -> Release relation we will have 3918 + 17217 = 21135 documents
This is actually correct. You can see this being shown here: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1080
One Spec (bibtem) can have multiple Releases (bibitem/relations).
This is an inaccurate simplification.
Each Spec-Release document is a separate document. Each document has a version number. At every new release, there is a new version number even though the document content is identical:
I was able to find the versions, and the way to generate the URL, will post later.
I finally found it!
SpecDocument is actually defined the table called 2001-04-25_schedule
. (I know, funny name.)
These are the attributes we want from this table:
spec
=> Spec namerelease
=> Release nameMAJOR_VERSION_NB
, TECHNICAL_VERSION_NB
, EDITORIAL_VERSION_NB
=> SpecDocument versionlocation
=> SpecDocument URLmeeting
=> SpecDocument meetingACHIEVED_DATE
=> SpecDocument upload date (published date?)comment
=> SpecDocument commentHere's a direct comparison of database rows against the web content:
The Spec_GSM+3G_release-info
table is what provides the "remarks" and "withdrawn" information in the web display:
The table 2008-03-08_Specs-vs-WIs
is a joining table between Specs and Work Items (WIs).
The table committees_local-names
is a list of all committees:
- A Spec (e.g. "TS 45.902") can contain multiple SpecDocuments (e.g. "TS 45.902:Rel-7", "TS 45.902:Rel-8", "TS 45.902:Rel-9")
@ronaldtse The relaton-*
flavor gems don't hadle collections of documents. The relaton-cli
does but not in this way. We can use bibitem/relation
the same way as for an all parts document. Is it ok?
@andrew2net yes, that's what I meant. Thanks for the clarification.
So we have three types of bibliographic items (citable items) in 3GPP:
@ronaldtse do we need these attributes in the relaton-3gpp-model?
// "Releases" table class Release { code: String description: String shortDescription: String version2G: String version3G: String isDefunct: Boolean remarks: String wpmCode2G: String wpmCode3G: String freezeMeeting: String freezeStage1Meeting: String freezeStage2Meeting: String freezeStage3Meeting: String closeMeeting: String projectStart: DateTime projectEnd: DateTime previousRelease: Release }
I didn't mange to find how the "Releases" table is related to Specs. The table has fields:
database['Releases'][0]
=> {:Release_code=>"Ph1",
:Release_description=>"Phase 1",
:Release_short_description=>"Ph1",
:version_2g=>"3",
:version_3g=>"-",
:"sort-order"=>"100",
:defunct=>"1",
:remarks=>"Release closed - no CRs permitted.",
:wpm_code_2g=>"GSM_PH1",
:wpm_code_3g=>nil,
:"freeze meeting"=>"gsm-25b",
:PROJECT_ID=>"705",
:"rel-proj-start"=>nil,
:"rel-proj-end"=>nil,
:ITUR_code=>nil,
:version_2g_dec=>"3",
:version_3g_dec=>"-",
:previousRelease=>nil,
:Stage1_freeze=>"GSM-25b",
:Stage2_freeze=>"GSM-25b",
:Stage3_freeze=>"GSM-25b",
:Protocols_freeze=>"GSM-25b",
:Closed=>"SMG-17",
:Field1=>nil}
The field :release=>"Rel-15"
from "2001-04-25_schedule" table doesn't match with any :Release_code
or :Release_description
in the "Releases" table:
database['Releases'].detect { |r| [:Release_code] == "Rel-15" }
=> nil
database['Releases'].detect { |r| [:Release_description] == "Rel-15" }
=> nil
The field :release=>"Rel-15" from "2001-04-25_schedule" table doesn't match with any :Release_code or :Release_description in the "Releases" table:
It is available though, in Release_code
and also Release_short_description
:
It is available though, in
Release_code
and alsoRelease_short_description
:
@ronaldtse you are right, it's my mistake
@ronaldtse not every row in the "2001-04-25_schedule" table has related rows in the "Specs_GSM+3G" and "Specs_GSM+3G_release-info" tables. Here are missing spec numbers:
schedule.select { |sc| spec.detect { |s| s[:Number] == sc[:spec] }.nil? }.map { |sc| sc[:spec] }.uniq
=> ["00.000",
"00.001",
"02.10U",
"02.23U",
"02.24U",
"02.25U",
"02.30U",
"02.40U",
"02.41U",
"02.50U",
"02.51U",
"02.52U",
"02.53U",
"02.54U",
"02.55U",
"03.20U",
"03.21U",
"03.22U",
"03.23U",
"04.01U",
"04.02U",
"04.03U",
"04.04U",
"22.10U",
"22.20U",
"25.25U",
"31.01U",
"31.02U"]
We don't have some critical data without these relations, for example title
is in the spec table only. Shoudn't we skip these documents?
@andrew2net yes, let's skip this list of spec numbers. From a Google search, they don't seem to exist.
@ronaldtse
schedule.select { |sc| sc[:spec] == "03.20ext" && sc[:release] == "Ph1-EXT"}.map { |sc| [sc[:MAJOR_VERSION_NB], sc[:TECHNICAL_VERSION_NB], sc[:EDITORIAL_VERSION_NB]] }
=> [["3", "0", "0"], ["3", "0", "0"], ["3", "0", "0"]]
Documents without URLs are fine.
For the duplicates, let me investigate the data and get back. For now just take the newest record.
@ronaldtse the "Specs_GSM+3G_release-info" table doesn't have some spec+release combination:
specrel.detect { |sr| sr[:Spec] == '30.531' && sr[:Release] == 'Rel-5' }
=> nil
- There are duplicates in the "2001-04-25_schedule" table. For example filter by spec: "03.20ext" release: "Ph1-EXT" returns 3 documents with same version "3.0.0
In the database:
Here's the corresponding 3GPP page:
This means that we should treat every "Upload date" as unique.
Moreover when I click on "version details":
Upload date 2003-09-01:
Upload date 1995-01-01:
Some explanation:
location
fields are identical in these 3 recordsWKI_ID
which is the "ETSI work item" number. That's why in the GUI you see the newest entry having the "ETSI" button.ACHIEVED_DATE
differs at the meeting
attribute, where one is smg-07
and another smg-12
. Originally I thought we should set "upload date per version" as the uniqueness criteria. However, I found this entry of spec "00.02u": it contains no dates, no location.
Then I realize that the missing rows seem to have a missing attribute 3guId
: it is set to NULL
.
This is validated by:
50.099
v0.1.6, release
"Ph1", containing two entries, but only the entry with 3guId
not NULL is displayed.48.018
v9.6.1, release
"Rel-9", containing two entries, only the one with 3guId
not NULL is displayed.Conclusion:
2001-04-25_schedule
: Drop all entries with 3guId == NULL
.@ronaldtse the "Specs_GSM+3G_release-info" table doesn't have some spec+release combination:
Because this combination does not exist:
Look at the 3GPP page:
There is no Rel-5
(Release 5) for this spec.
Conclusion:
- In
2001-04-25_schedule
: Drop all entries with3guId == NULL
.- All other entries are valid
@ronaldtse not all. For example for TR 02.08:Ph1/0.2.1
there are rows with 3guId
639 and 5227; for TS 03.20ext:Ph1-EXT/3.0.0
there are 3 rows with 3guId
2885, 16103, and nil. Maybe we need to add the 3guId
to reference?
@andrew2net can you check the 3GPP portal to see what the results are?
I suspect these entries have different upload dates and are all valid (except for the NULL entry in 3guId column).
@ronaldtse it's very strange. In the DB there are two "0.2.1" versions with not nil 3guId
for spec 02.08
and release Ph1
(Phase 1). For spec 02.08
and release Ph2
(Phase 2) there isn't any "0.2.1" version:
schedule.select { |sc| sc[:spec] == "02.08" && sc[:release] == "Ph1" && sc[:MAJOR_VERSION_NB] == '0' && sc[:TECHNICAL_VERSION_NB] == '2' && sc[:EDITORIAL_VERSION_NB] == '1' }.size
=> 2
schedule.select { |sc| sc[:spec] == "02.08" && sc[:release] == "Ph2" && sc[:MAJOR_VERSION_NB] == '0' && sc[:TECHNICAL_VERSION_NB] == '2' && sc[:EDITORIAL_VERSION_NB] == '1' }.size
=> 0
on the 3GPP portal one "0.2.1" version is displayed in "Phase 1" and one in "Phase 2" https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=35
Some further insights from the screenshot:
As @andrew2net mentioned, the magical thing is both records have a "release" value of Ph1
. However in the screenshot these two entries split into "Phase 1" and "Phase 2"! How is that even possible?
I found two places that have "02.08" listed as Phase 1 and Phase 2:
Now downloading the latest 2021-12-05 DB to see if there are any changes...
I downloaded the latest status_smg_3GPP_2021-12-06_15h15-CET but the data for 02.08 is unchanged, i.e. both records still have the value Ph1
.
Notice that the heading for each release is directly from the "Specs_GSM+3G_release-info" table.
So they probably first loaded the Releases, and then load the specs per release.
@andrew2net I am not sure if 3GPP has a different database, but in the 2001-04-25_schedule
table we can see data from 2021-12-03, which is super recent. This would indicate that at least this table is likely authoritative. So I don't understand how they placed the same record in "Phase 1" vs "Phase 2" that way.
However, I just found this table: 2003-04-10_webexp04_release-and-version-details_table
.
This data is correct! Yet the newest data for this table is only until 2013.
So it is very possible that there is another table (e.g. release-and-version-details_table
) that contains the correct information, but it is not published.
@andrew2net can we close this now? Thanks.
@ronaldtse we have an unresolved issue with releases and versions relation.
I'm going to close this and leave the unresolved problem in a new issue.
Data: http://xml2rfc.tools.ietf.org/public/rfc/bibxml-3gpp-new (there is a bulk download)
Data instance sample: http://xml2rfc.tools.ietf.org/public/rfc/bibxml-3gpp-new/reference.3GPP.55.236.xml
We need to develop a Relaton model, and then import the data.