w3c / tdm-reservation-protocol

Repository of the Text and Data Mining Reservation Protocol Community Group
https://www.w3.org/community/tdmrep/
Other
7 stars 8 forks source link

Decide on the order in which TDM Agents must check the 3 possible techniques #18

Closed llemeurfr closed 2 years ago

llemeurfr commented 3 years ago

For the sake of interoperability, it is important for TDM Agents to know where the should look for information first, then second, then third. It will avoid cases where the information has been left in several places by publishers, with different values for tdm-reservation and tdm-policy.

My proposal is : 1- TDM File on the Origin Server -> the TDM Agent can check this before starting scraping a website, cache the information if any, and check the rules against any content he fetches. 2- TDM Header Field for HTTP Requests -> this is the first thing the TDM Agent will see when it fetches content and it is easy to process. Info found here supercedes previous info. 3- TDM Metadata in HTML Content -> this can only be done after the HTML content has been parsed. Therefore later and after more processing. Info found here supercedes previous info.

llemeurfr commented 3 years ago

This was integrated in the draft of specification.

llemeurfr commented 2 years ago

On October 5th, 2021, the group decided to validate section 6.4 as-is.