open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
274 stars 175 forks source link

add `origin_referrer_url`, `origin_url` and `zone_identifier` to the file attribute #1430

Open AsuNa-jp opened 1 month ago

AsuNa-jp commented 1 month ago

Changes

This PR adds the following attributes.

(Thanks @trisch-me for all the advice you gave me in creating this PR!)

Background: What are these fields for? (Updated 2024/Oct/21)

When downloading files from the internet (or network) using a web browser (such as Chrome or Edge) or a certain application, information about where the file came from is generally added to the file. This is a general behavior that can occur on all operating systems, and its primary use is to enhance security by providing context about the file’s source, allowing the system to assess potential risks and enforce appropriate security measures.

The details are explained below.

Windows

In Windows, it is known as the Mark of the Web(ref1, ref2), and is added to the file's NTFS alternate data stream.

For example, when you download an image file (image17.webp) from this webpage using a web browser, the download source URL is automatically added to the file's Alternate Data Stream (ADS) as following.

image

This PR adds a field to store the URL of the file's origin, which is saved in the NTFS alternate data stream (ADS).

Note - In the case of Windows, MotW can be used not only with NTFS but also with ReFS (8.1/2012 R2 or later)

Linux

In Linux, some applications may store the file origin metadata in extended attributes (xattr) or Gnome virtual filesystem(gvfs) to track the source of a file.

For example, when you download an image file (image17.webp) from this webpage using a web browser, the download source URL is automatically added to gvfs.

example of a file downloaded by using firefox image

Additionally, by using Curl or Wget, the referer URL(user.xdg.referrer.url) and origin URL(user.xdg.origin.url) can be attached to the file's extended attributes. (Google Chrome used to add user.xdg.referrer.url and user.xdg.origin.url as well but it currently turned off this feature.)

example of a file downloaded by using curl image

Note - As written in this web page, all major Linux file systems including Ext4, Btrfs, ZFS, and XFS support extended attributes.

MacOS

(Since I don't have a Mac device, my investigation will be based on the internet.)

In MacOS, some applications may store the file origin metadata in extended attributes to track the source of a file as follows. It seems that both the referrer and origin URL are being saved.

image

The image source is as follows: https://stackoverflow.com/questions/70444996/obtaining-metadata-where-from-of-a-file-on-mac

The same thing is mentioned on another website as well. (https://exiftool.org/forum/index.php?topic=14991.0)

Usually if we save a file from browser, the file will have 2 strings in the 'Where from' attribute:
image

Background: the use cases. (Updated 2024/Oct/21)

Merge requirement checklist

linux-foundation-easycla[bot] commented 1 month ago

CLA Signed

The committers listed above are authorized under a signed CLA.

AsuNa-jp commented 1 month ago

Hi @trisch-me Thank you for the prompt feedback on this PR. All of your points are absolutely valid. I have updated the PR based on your suggestions. (https://github.com/open-telemetry/semantic-conventions/pull/1430/commits/160b7ee57d8b17d05604ffa7c28b0febb32f3d98, https://github.com/open-telemetry/semantic-conventions/pull/1430/commits/37c9710f956bf263b920c09db7adb515545914f3) If there is anything else, please feel free to let me know!

AsuNa-jp commented 1 month ago

Thank you all for your comments. Based on feedback from various sources, I have added file.zone_identifier to this PR.

However, since there also have been concerns raised about whether the fields we plan to add are even necessary, we are considering having @trisch-me (and @magermark ) lead a more in-depth discussion during the upcoming Otel Semantic Convention meeting.

AsuNa-jp commented 3 weeks ago

Hi @trisch-me @lmolkova @joaopgrassi Based on last week’s discussion at the Symantec convention meeting, I have added additional explanations to this PR. If you need further explanations before approving this PR, please don't hesitate to let me know.

trisch-me commented 3 weeks ago

@AsuNa-jp could you please fix conflicts? thanks

jsuereth commented 1 week ago

My only remaining concern with this PR (and based on @lmolkova's comments) is whether the things you're defining should be event fields vs. attributes.

The two use cases you mention both involve an event. Are these attributes you're defining things we'd want to include in Spans and Metrics?

I think it'd be reasonable to define a file.open Event that has these fields within it, but I'm not positive how you'd possible have that "turn into a metric" or otherwise interact with spans.

AsuNa-jp commented 1 week ago

Hi @jsuereth Thank you for taking the time to provide feedback on this PR! I understand your concern that, based on the use cases I provided, origin_referrer_url, origin_url, and zone_identifier may not seem necessary to include in the Attribute.

However, I personally believe that this information should be included in the Attributes, as they may need to be referenced later. For example, these fields could be invaluable in determining whether a previously downloaded file was actually sourced from a newly identified malicious website.

This is just an example, but let’s say it was discovered today that https://outlook.office.com/ is a malicious website. These fields would be helpful to find out, 'How many files were actually downloaded from this site?

image

I hope this helps address your concerns. However, if you have any further concerns, please don’t hesitate to let me know.