hughsie / colord

Making color management just work
GNU General Public License v2.0
72 stars 50 forks source link

colormgr get-devices is empty when two identical monitor models are plugged in #164

Open voidptr127 opened 7 months ago

voidptr127 commented 7 months ago

System: Fedora 39 (Workstation), GNOME: 45.3, Kernel 6.6.12-200

I can't assign ICC profiles to my monitors because colormgr get-devices returns nothing when two identical monitor models are plugged in. This also affects the gnome-color-manager where neither monitor is shown when both are plugged in.

First, I checked system logs and got the following:

$ sudo journalctl -xe | grep colord
[...] gnome-shell[2099]: Failed to create colord device for 'xrandr-Acer Technologies-XF270HU-T78EE0048521': device id 'xrandr-Acer Technologies-XF270HU-T78EE0048521' already exists

Next, I unplugged both displayport cables, then only plugged one monitor back in. In this case, everything works as expected:

$ sudo colormgr get-devices 
Object Path:   /org/freedesktop/ColorManager/devices/xrandr_Acer_Technologies_XF270HU_T78EE0048521_xxxxx_1234
Owner:         xxxxx
Created:       January 24 2024, 03:08:54 PM
Modified:      January 24 2024, 03:08:54 PM
Type:          display
Enabled:       Yes
Embedded:      No
Model:         XF270HU
Vendor:        Acer
Serial:        T78EE0048521
Scope:         temp
Colorspace:    rgb
Device ID:     xrandr-Acer Technologies-XF270HU-T78EE0048521
Profile 1:     icc-f02b98744a0e3b3abfdd767bf9bbc8ab
               /var/lib/colord/icc/Acer XF270HUA-center.icm
Metadata:      OutputEdidMd5=1a7e20b4e5f0fbc8d83026b0fc75d8a4
Metadata:      OutputPriority=primary
Metadata:      XRANDR_name=DP-2
Metadata:      OwnerCmdline=/usr/bin/gnome-shell

Note that at this stage, I could also assign the ICC profile. However, it only works for the monitor that was plugged in. As soon as I plug in the second monitor, colormgr get-devices returns nothing and the default ICC profile for the other monitor is used. I also compared both outputs of colormgr get-devices. Both return exactly the same model and serial number.

Here is two times the output from colord with the --verbose flag set:

The first output shows what colord prints when I did stop colord and unplugged both monitors. Then I hit enter (blindly) to start colord, and plugged both monitors in, one after another. The second output is printed when I killed the previous process and started it afterwards while both monitors were already plugged in.

Particularly interesting might be these lines from the second file:

sudo -u colord /usr/libexec/colord --verbose
[...]
16:26:01    display card1-DP-2 has ID 'xrandr-Acer-XF270HU-T78EE0048521' from MD5 1a7e20b4e5f0fbc8d83026b0fc75d8a4
16:26:01    display card1-DP-3 has ID 'xrandr-Acer-XF270HU-T78EE0048521' from MD5 5516e0495119e86738e01422cfb835aa
[...]

I am not entirely sure about the expected behavior, but I suspect that the names should differ to avoid the confusion.

hughsie commented 7 months ago

The monitors are supposed to have different serial numbers. I'm afraid you might have to dig into the code to find a workaround we can apply upstream.

voidptr127 commented 7 months ago

I checked the serial numbers on the back of my monitors. It turns out that the serial number is truncated. It uses the first eight letters of the serial number: T78EE004 ... and the last 4 digits "8521" which apparently match. The other 8 hex digits in-between are entirely different.

Do you know why there are different md5 hashes then? When I cat /sys/class/drm/card1-DP-2/device/card1-DP-2/edid I see the truncated serial number with non-defined binary output before and after. So I guess the reading comes from there.

voidptr127 commented 7 months ago

Found my missing serial number digits. They are not encoded in hexadecimal but in decimal. However, they match the numbers on the back of my monitors:

# cat /sys/class/drm/card1-DP-2/edid | edid-decode | grep -E 'Display Product|Serial'
    Serial Number: 1422285011
    Display Product Name: 'XF270HU'
    Display Product Serial Number: 'T78EE0048521'
# cat /sys/class/drm/card1-DP-3/edid | edid-decode | grep -E 'Display Product|Serial'
    Serial Number: 1422285039
    Display Product Name: 'XF270HU'
    Display Product Serial Number: 'T78EE0048521'

It appears to be the only difference between both edid files when I do a diff (the first part is a hex dump displayed by edid-decode.)

# diff <(cat /sys/class/drm/card1-DP-2/edid | edid-decode) <(cat /sys/class/drm/card1-DP-3/edid | edid-decode)
[...]
<     Serial Number: 1422285011
---
>     Serial Number: 1422285039
[...]

I don't know about the internals of the edid file format. It seems that both serial numbers need to be concatenated when creating the device id. Any other monitor examples where this is similar? I guess one could fallback to the current scheme if the "Serial Number" field does not exist.

voidptr127 commented 7 months ago

I checked the EDID file format with the information on Wikipedia and it turns out that the missing numbers are supposed to be the "real" serial numbers. They can be directly read from the edid file at bytes 12-15 (32 bits, little endian, index starts at 0). The other "Display Product Serial Number" seems to be an "display/monitor descriptor" (max 18 bytes), which is not necessarily unique.

I am not sure which source for the serial number colord uses, as I could not directly find a way to get this information with xrandr. However, I am pretty sure that this is a bug how the name of the device-id is constructed. I.e., the "wrong" serial number is used, or at least it is not complete like in my case.

The check implemented in file src/cd-main.c beginning from line 2214 in cd_main_check_duplicate_edids is somewhat superfluous then. Because edid files encode the serial number and serial numbers are always different, the edid files are also always different. Thus, the hashes should never be the same. I.e., this check will always return false (if there is nothing wrong with the system or monitor). If this check is there to deal with "bogus systems / monitors" then forget about what I've just written.

What do you think?

voidptr127 commented 7 months ago

Found the problem in lib/colord/cd-edid.c

First hint at ll. 545ff:

/* maybe there isn't a ASCII serial number descriptor, so use this instead */
serial = (guint32) data[CD_EDID_OFFSET_SERIAL+0];
serial += (guint32) data[CD_EDID_OFFSET_SERIAL+1] * 0x100;
serial += (guint32) data[CD_EDID_OFFSET_SERIAL+2] * 0x10000;
serial += (guint32) data[CD_EDID_OFFSET_SERIAL+3] * 0x1000000;
if (serial > 0)
    priv->serial_number = g_strdup_printf ("%" G_GUINT32_FORMAT, serial);

This indicates that it is assumed that the serial number is the same as the display product serial number, but they are not.

After line 603, the serial number is then overwritten with the display product serial number:

[...]
} else if (data[i+3] == CD_DESCRIPTOR_DISPLAY_PRODUCT_SERIAL_NUMBER) {
    tmp = cd_edid_parse_string (&data[i+5]);
    if (tmp != NULL) {
        g_free (priv->serial_number);
        priv->serial_number = tmp;
    }
}
[...]

Which is then used in src/cd-main.c to create the device id (ll. 2199ff):

static gchar *
cd_main_get_display_id (CdEdid *edid)
{
    GString *device_id;

    device_id = g_string_new ("xrandr");
    if (cd_edid_get_vendor_name (edid) != NULL)
        g_string_append_printf (device_id, "-%s", cd_edid_get_vendor_name (edid));
    if (cd_edid_get_monitor_name (edid) != NULL)
        g_string_append_printf (device_id, "-%s", cd_edid_get_monitor_name (edid));
    if (cd_edid_get_serial_number (edid) != NULL)
        g_string_append_printf (device_id, "-%s", cd_edid_get_serial_number (edid));
    return g_string_free (device_id, FALSE);
}

Which is not correct, in my opinion.

My suggestion would be to keep both separate and change the construction of the device id.

I.e., in lib/colord/cd-edid.c I'd suggest to add another field to the CdEdidPrivate struct (ll. 45ff):

typedef struct
{
    CdColorYxy      *red;
    CdColorYxy      *green;
    CdColorYxy      *blue;
    CdColorYxy      *white;
    gchar           *checksum;
    gchar           *eisa_id;
    gchar           *monitor_name;
    gchar           *pnp_id;
    gchar           *serial_number;
    /* the line below is added */
    gchar           *display_product_serial_number;
    gchar           *vendor_name;
    gdouble          gamma;
    guint            height;
    guint            width;
} CdEdidPrivate;

Then assign the display product serial number to that separate field at ll. 603ff instead of replacing the serial number:

} else if (data[i+3] == CD_DESCRIPTOR_DISPLAY_PRODUCT_SERIAL_NUMBER) {
    priv->display_product_serial_number = cd_edid_parse_string (&data[i+5]);
    /*  
    // delete these commented lines 
    if (tmp != NULL) {
        g_free (priv->serial_number);
        priv->serial_number = tmp;
    }
    */
}

Make sure to update cd_edid_reset and cd_edid_finalize as well (ll. 396 and ll. 643)

void
cd_edid_reset (CdEdid *edid)
{
    CdEdidPrivate *priv = GET_PRIVATE (edid);

    g_return_if_fail (CD_IS_EDID (edid));

    /* free old data */
    g_free (priv->monitor_name);
    g_free (priv->vendor_name);
    g_free (priv->serial_number);
    /* add the line below */
    g_free (priv->display_product_serial_number);
    g_free (priv->eisa_id);
    g_free (priv->checksum);

    /* do not deallocate, just blank */
    priv->pnp_id[0] = '\0';

    /* set to default values */
    priv->monitor_name = NULL;
    priv->vendor_name = NULL;
    priv->serial_number = NULL;
    /* add the line below */
    priv->display_product_serial_number = NULL;
    priv->eisa_id = NULL;
    priv->checksum = NULL;
    priv->width = 0;
    priv->height = 0;
    priv->gamma = 0.0f;
}

static void
cd_edid_finalize (GObject *object)
{
    CdEdid *edid = CD_EDID (object);
    CdEdidPrivate *priv = GET_PRIVATE (edid);

    g_free (priv->monitor_name);
    g_free (priv->vendor_name);
    g_free (priv->serial_number);
    /* add this line */
    g_free (priv->display_product_serial_number);
    g_free (priv->eisa_id);
    g_free (priv->checksum);
    g_free (priv->pnp_id);
    cd_color_yxy_free (priv->white);
    cd_color_yxy_free (priv->red);
    cd_color_yxy_free (priv->green);
    cd_color_yxy_free (priv->blue);

    G_OBJECT_CLASS (cd_edid_parent_class)->finalize (object);
}

Add a new function to access it:

const gchar *
cd_edid_get_display_product_serial_number (CdEdid *edid)
{
    CdEdidPrivate *priv = GET_PRIVATE (edid);
    g_return_val_if_fail (CD_IS_EDID (edid), NULL);
    return priv->display_product_serial_number;
}

Update cd_main_get_display_id in src/cd-main.c at ll. 2199ff:

static gchar *
cd_main_get_display_id (CdEdid *edid)
{
    GString *device_id;

    device_id = g_string_new ("xrandr");
    if (cd_edid_get_vendor_name (edid) != NULL)
        g_string_append_printf (device_id, "-%s", cd_edid_get_vendor_name (edid));
    if (cd_edid_get_monitor_name (edid) != NULL)
        g_string_append_printf (device_id, "-%s", cd_edid_get_monitor_name (edid));
    if (cd_edid_get_display_product_serial_number (edid) != NULL)
        g_string_append_printf (device_id, "-%s", cd_edid_get_display_product_serial_number (edid));
    if (cd_edid_get_serial_number (edid) != NULL)
        g_string_append_printf (device_id, "-%s", cd_edid_get_serial_number (edid));

    return g_string_free (device_id, FALSE);
}

However, I am not sure how the new naming scheme would affect existing installations.

hughsie commented 7 months ago

how the new naming scheme would affect existing installations

I think it would break all the profile mappings! :)

I think what's supposed to happen is that the CD_DESCRIPTOR_DISPLAY_PRODUCT_SERIAL_NUMBER is only used when the serial "number" contains non-numeric chars. Maybe you can add a quirk for that specific EDID model so that the CD_DESCRIPTOR_DISPLAY_PRODUCT_SERIAL_NUMBER gets ignored completely?

voidptr127 commented 7 months ago

I am really struggling to get this sorted. First, I implemented the changes I proposed, but it turns out that changing this only affects the debug message and not anything else in the code. Hence, commenting out the code that overwrites the serial number with the display product serial number did not change anything either.

Next thing I did was setting priv->use_xrandr_mode always to be TRUE in src/cd-main.c. This seemed to work at first, as two monitors were shown in the gnome-color-manager. However, somehow it still created the old device id and it was using it along the xrandr device id simultaneously. Although I was seeing 2 devices in the gnome color manager, changing profiles was only possible for 1 of 2 monitors. Changing profile for the other monitor just did nothing, although it displayed that it changed it.

I am tempted to give up and accept that I can't have my color-calibrated ICC profiles with colord installed.

All I want is to somehow overwrite the default serial number. Where is the serial number data read from? It's not read from the /sys/class/drm/*/edid files from what I can tell. I can't find any xrandr or other library calls to read data regarding the display device, although it has XRANDR_name as a property set. All I see in the code are SQLite queries and DBUS functions. But how does it find the serial number property or any other device property to install it there in the first place? Is there another DBUS instance where it reads that data from?

hughsie commented 7 months ago

Where is the serial number data read from

The desktop client reads the EDID and creates the device with colord using DBus. You'll need to install the library before it'll be used by the session.

voidptr127 commented 7 months ago

Thank you for the reply. Can you specify what you mean by desktop client? I see that lib/colord/cd-client.c provides the cd_client_create_device function, but I can't find what calls the function. Please excuse the spam on that topic.

Do you mean that I would need to fully overwrite the default installation of colord with a custom one? I suppose killing colord and starting the custom compiled colord executable is not enough then?

hughsie commented 7 months ago

Can you specify what you mean by desktop client

Well, in GNOME it's gnome-settings-daemon for instance.

Do you mean that I would need to fully overwrite the default installation of colord with a custom one

At least the colord shared library.

voidptr127 commented 7 months ago

I tried an installation yesterday without success. I am not entirely sure whether meson install copied the files to the wrong location for Fedora or whether GNOME performs some static linking that I am not aware of. But I did no further investigation by compiling GNOME or searching for a SPEC file to verify the locations.

However, your comment got me thinking that maybe GNOME itself is the issue here. I installed KDE today and both displays showed up in the settings out of the box. Although I could not assign profiles via the GUI, I was able to assign them via colormgr. Good enough, I guess.

KDE uses a different naming scheme that includes the connecting monitor port at the end of the name. This seems to be similar to the XRANDR_name fallback in colord.

I will keep using KDE, because it solved my color profile problem. So for me, the case is closed.

I'd still suggest to enhance colord by exposing the "other EDID serial number" somehow. Probably best to do so via a separate function in the lib/colord/cd-edid.c file and keeping the behavior of the currently returned (display product) serial number (priv->serial) as it is for backwards compatibilty. Probably a hint to the GNOME project is also not a bad idea that they are assuming that the display names should be unique, but it might not always be the case.

Alternatively (?) or additionally, one might extend the "duplicate edid" check with a "duplicate device id" check to change usage of device ids in the same manner. I.e., switch to xrandr device id naming scheme if two device ids will be the same. I think that using the xrandr naming scheme, should have actually solved my problem, as this is what I can observe when using KDE.

Thanks for your help! You can close the issue if you want or keep it open as a reference.

gahLAHAD commented 3 months ago

I'm having the same issue on GNOME. Is there any "solution" other than changing to KDE or dig the code myself? For example, is it possible to change the device id manually before plugging in the other monitor? Just a thought. Cheers!