watfordjc / Data-Catalogue

Non-functional storage media cataloguing utility.
MIT License
0 stars 0 forks source link

Assign OID arcs for data media OIDs #1

Open watfordjc opened 1 year ago

watfordjc commented 1 year ago

Feature Branch

Current feature branch for this issue: not created yet.

Progress


Background

To identify storage media within a catalogue, there needs to be a way to uniquely identify a physical drive/tape, but existing identifiers have issues:

Ideally, dev URNs as defined in RFC 9039 would be used, and everything could be uniquely identified by URN. Unfortunately, the only way that would work would be if every storage device manufacturer defined how to convert their identifiers (e.g. serial number) into a dev URN, possibly per Section 4.4 (Organization Serial Numbers - "urn:os:") or Section 4.5 (Organization Product and Serial Numbers - "urn:ops") of RFC 9039.

Requesting a Private Enterprise Number (PEN) from the IANA so I can create device URNs using my own PEN is an option, but it is not an ideal method.

As I don't want to deal with different URN namespaces for the moment, I have decided to go with deterministic OID arc generation (similar to OIDs created from UUIDs) for data media identifiers. Unfortunately, the IEEE-RA were supposed to be creating a method for turning WWNs/WWIDs into OIDs, but they don't appear to have done so yet.

I could use one of the organisation identifiers from wikidata to identify a manufacturer, but that would require (a) a wikidata entry existing for all manufacturers, (b) an organisation identifier of the same type available for every manufacturer on wikidata, and (c) a lookup method to convert a hardware identifier I can access locally into a manufacturer identifier via the Web.

I have thus decided to go with OIDs, with different storage media types/interfaces/organisations using different OID arcs that define how to generate that type of OID.

Placeholder OID Arc Names

I have decided on the following placeholder names for the OID arcs:

watfordjc commented 1 year ago

OID arc storage-media

Description: Information:

watfordjc commented 1 year ago

OID arc storage-media.ieee-wwn

Description: World Wide Name (WWN) identifiers Information: The Institute of Electrical and Electronics Engineers (IEEE) define the creation of World Wide Name (WWN) identifiers by storage technology manufacturers for objects such as hard disk drives. The IEEE Registration Authority (IEEE-RA) have not yet published a method for converting WWNs into OIDs.

A WWN is an 8-byte (64-bit) or 16-byte (128-bit) number, with the most significant 4 bits of the most significant byte indicating the Network Address Authority (NAA). The NAA defines how the remaining bits of the number are created.

Note: Some interface adapters may mask the real WWN of a storage device.

To generate an OID for an 8-byte WWN (NAA types 0x1, 0x2, 0x5, 0xC, 0xD, 0xE, 0xF), see child OID {iso(1) member-body(2) gb(826) national(0) eng-ltd(1) john-cook(11484356) john-cook(1) infosec(0) storage-media(a) ieee-wwn(b) 8-byte(8)}.

To generate an OID for a 16-byte WWN (NAA type 0x6), see child OID {iso(1) member-body(2) gb(826) national(0) eng-ltd(1) john-cook(11484356) john-cook(1) infosec(0) storage-media(a) ieee-wwn(b) 16-byte(16)}.

No OID arc has been allocated for generating OIDs for WWNs with following NAA types as such WWN/NAA identifiers have not been defined as globally unique: 0x0, 0x3, 0x4, 0x7, 0x8, 0x9, 0xA, 0xB.


Information and Research

WWN/WWID Format

NAA Values

The following NAA values (most significant 4 bits of the most significant byte of an IEEE WWN) are known:

The following NAA values are also known:

Identifiers in Linux

Using Linux /dev/disk/by-id/ symlinks and disk wwid values, these are some prefixes for storage media names that I have seen, each of which is probably of a different format due to the presence of a prefix:

Some interface adapters my mask the real WWN. For example, the USB to M.2 NVME adapter in my Raspberry Pi reports /dev/sda has a serial number of 0123456789ABCDEF and a WWID of t10.JMicron Generic 0123456789ABCDEF. Using a USB adapter (and operating system) that support UAS-3/UASP might fix the issue.

Identifiers in Windows

In Windows WMI MSFT_Disk objects, the value of the UniqueId property is formatted based on the value of the UniqueIdFormat property. The UniqueIdFormat property uses similar definitions as those used in SCSI VPD page 0x83, with Windows prioritising what the UniqueId should be given the available identifiers for a disk in the following order (highest priority first): 0x8 (SCSI name string), 0x3 (FCPH - see below), 0x2 (EUI-64 based), 0x1 (T10 vendor ID based), 0x0 (Vendor specific).

I believe some of the Linux WWID prefixes are based on the same thing: 0x3 (naa), 0x2 (eui), 0x1 (t10).

The following observations are made on my computer:

As the process for Windows creating GUID-style UniqueId values and NAA type 0x3 UniqueId values is not known, NAA types 0x0 and 0x3 should not be supported.

UniqueIdFormat 0x0 (Vendor specific) also has an issue with producing unique identifiers. I am not sure how it creates the GUID-like strings, but I have a mix of 32GB and 64GB Kingston USB drives. A (Manufacturer, Model, Serial Number) tuple of (Kingston, DataTraveler 3.0, 0000000005) produces the same 0x0 UniqueID on my computer: {9c80dcde-bafb-65cc-bb92-3e095c56a6dd}.

For these reasons, I suggest only relying on drive identifiers in Windows where the UniqueIdFormat is 0x8 (SCSI name string).

FC-PH Format

I am not yet sure what FCPH format is. FC-PH presumably means "Fibre Channel Physical and Signaling Interface" (ANSI INCITS 230), so might be defined in FC-PH-2 and/or FC-PH-3. As I don't use Fibre Channel, and WMI says my portable hard disk's UniqueId uses UniqueIdFormat of 3, I am going to assume that "FCPH" 0x8 might be equivalent to SCSI VPD page 0x83 type 0x3: NAA.

RFC 3980 (T11 Network Address Authority (NAA) Naming Format for iSCSI Node Names) says that the NAA format was created by INCITS/T11, and that NAA is defined in "INCITS T11 Framing and Signaling Specification [FC-FS]". INCITS/T11 is now known as INCITS/Fibre Channel, in the same way INCITS/T10 is now known as INCITS/SCSI. So, while SCSI standards may use NAA identifiers, NAA identifiers are not defined by those that write the SCSI standards.

Finding a copy of FC-FS, the NAA values in the section above have been updated.

Looking at the values of a couple of Seagate Portable USB drives, however, I do not believe FC-PH values are globally unique - just Google 'windows disk id 5000000000000001' and see they might not even be locally unique.

SCSI Name String

SPC-4 says the SCSI NAME STRING is a null-padded UTF-8 string, that has a length divisible by 4 bytes and a length that is no more than 256 bytes.

The field starts with the following 4 characters, which define the rest of the field:

NAA EUI-64 Mapped

An EUI-64 identifier is supposed to be globally unique. The most significant 3 bytes (24 bits) are the OUI, which is assigned by the IEEE-RA. These 3 bytes are, for example, the first 6 hexadecimal digits in a MAC address. Similar to how CIDR was created for IPv4 allocations, the IEEE created MA-M and MA-S blocks for MAC address assignments (with 24-bit OUIs being MA-L blocks).

As such, an MA-S assignment includes an OUI-36 (36-bit OUI) and an MA-M does not include an OUI assignment. In both cases, the IEEE-RA gets assigned the 24-bit OUI, which means the 3 byte OUI might not identify who assigned a particular EUI-64. In any case, the definition of an OUI is the same: the two least significant bits (bit 0 or OUI bit M or MAC U/L bit; bit 1 or OUI bit X; or MAC I/G bit) of the most significant byte are 0 (i.e. 0bnnnnnnXM = 0bnnnnnn00).

The IEEE also assign company IDs (CIDs), and these have the OUI X bit and the CID Z bit set to 1 (i.e. 0bnnnnZYXM = 0bnnnn1010). If the X bit is 1 it means it is a local address, so is not globally unique. The local addressing equivalent of an EUI-64 is an ELI-64.

So, to sanity check an EUI-64 we just need to check it is 8 bytes long and the two least significant bits of the most significant byte are 0. If it is being called an EUI and it passes those checks, we can assume it really is an EUI-64 and is therefore globally unique.

Because those two bits of an OUI have a specific defined meaning by the IEEE-RA (and any related standards using OUIs, such as Wi-Fi and Ethernet), it is possible to create an 8-byte WWN from an 8-byte EUI-64 because the Fibre Channel and/or SCSI people came up with a solution: remove those two bits (i.e. right shift the most significant byte by two bits), and make the two most significant bits 1 (i.e. XOR with 0xC0 or 0b11000000).

For example, an EUI-64 of AC-DE-48-23-45-67-01-9F has a most significant byte of 0xAC:

An EUI-64 could contain a smaller identifier like an EUI-48 (48-bit MAC address), and those can be identified with a couple of the EUI-64 bytes having the value 0xFF 0xFE, but that method is deprecated so I am not going to consider it here.

Similarly, SCSI Name Strings may contain an eui. identifier that is based on an EUI-64 but is a 96-bit (24-byte) or 128-bit (32-byte) long number. I am not covering these here because they may not identify a specific physical piece of hardware with a static globally unique identifier (i.e. the EUI-64 the identifier is based on may change if a piece of hardware is moved or software is reconfigured, causing the 96-/128-bit identifier to also change).

watfordjc commented 1 year ago

OID arc storage-media.jedec-mmc.cid

Description: Information:

watfordjc commented 1 year ago

OID arc storage-media.usb-vid-pid.serno

Description: Unique USB identifiers Information: Reserved for future use.

USB is tricky because USB devices should have a unique combination of Vendor ID (VID), Product ID (PID), and Serial Number (SERNO), but in a lot of cases every single product for a given VID and PID have identical serial numbers.

Kingston, for example, etch micro-lettering on the USB jack of their flash drives to somehow prove they are genuine. I have two 32 GB and two 64 GB DataTraveler drives where the only things that differ are: (a) the PnP Device ID, (b) the MBR drive signature, and (c) the MBR checksum.

The micro-lettering is the same on the two 32 GB drives, and is also the same on the two 64 GB drives, so I see no way at all to uniquely identify a flash drive either physically or electronically. The MBR signature or the GPT partition table UUID only identify a drive until it is next repartitioned/flashed, assuming writing an ISO to a drive (e.g. a Windows/Ubuntu/etc. EFI installation ISO) doesn't create drives with identical UUIDs.

For portable Seagate drives (possibly other manufacturers depending on how they turn a 2.5" disk into a USB device), it might be possible to get a WWN from the drive in Linux if you unmount and disconnect the drives, disable UAS/UASP, and then run smartctl. First get the USB VID:PID for the drive:

$ lsusb | grep -i seagate Bus 004 Device 012: ID 0bc2:2037 Seagate RSS LLC Expansion HDD

After getting the VID:PID (0bc2:2037 in this example), unmount the drive partitions, unplug the USB cable, and then add the VID:PID to the USB quirks list with :u (u = IGNORE_UAS (don’t bind to the uas driver)):

$ echo "0x0bc2:0x2037:u" | sudo tee /sys/module/usb_storage/parameters/quirks

Plugging the USB cable back in and wait for the kernel log to show the device has been added. Now you might be able to run smartctl to get the WWN, such as the following if the reconnected drive was reported in the kernel log as /dev/sdb:

$ sudo smartctl -a /dev/sdb ... LU WWN Device Id: x xxxxxx xxxxxxxxx ...

watfordjc commented 1 year ago

OID arc storage-media.lto-cm-uci

Description: Information: