relaton / relaton-iana

Relaton for IANA references
MIT License
0 stars 0 forks source link

RelatonIana::DataFetcher doesn't fetch media-types #18

Closed CAMOBAP closed 1 year ago

CAMOBAP commented 1 year ago

Problem

https://github.com/ietf-tools/relaton-data-iana/issues/9

It looks like the problem happens just because https://api.github.com/search/code API works such way out-of-the-box.

It fetches only the first XML from media-types:

https://raw.githubusercontent.com/ietf-ribose/iana-registries/main/media-types/application/vnd.paos.xml

There is no explicit explanation of this in official docs https://docs.github.com/en/rest/search?apiVersion=2022-11-28

Possible solutions

  1. make media-types structure “flat”
  2. Instead of calling https://api.github.com/search/code API do git clone without history
  3. Research about q filter to query directories in deep
ronaldtse commented 1 year ago

One problem about the iana-registries data set is that these media-types files are not actually XML files or structured data files. Maybe we can rename them into .txt since they are all in text.

There might be some post-processing we need to do to make the repository useable, e.g. structure them?

@andrew2net it might be easiest to just do 2 here. When we extract the iana-registries information into Relaton, it is a batch process anyway. Isn't it?

andrew2net commented 1 year ago

The media-types documents were until May 1. I'll check what happend.

andrew2net commented 1 year ago

Implemented 2.

$ relaton fetch 'IANA media-types'
[relaton-iana] ("IANA media-types") fetching...
[relaton-iana] ("IANA media-types") found IANA media-types
<bibdata type="standard" schema-version="v1.2.3">
  <fetched>2023-05-09</fetched>
  <title format="text/plain">Media Types</title>
  <uri type="src">http://www.iana.org/assignments/media-types</uri>
  <docidentifier type="IANA" primary="true">IANA media-types</docidentifier>
  <docnumber>media-types</docnumber>
  <date type="updated">
    <on>2023-05-02</on>
  </date>
  <contributor>
    <role type="publisher"/>
    <organization>
      <name>Internet Assigned Numbers Authority</name>
      <abbreviation>IANA</abbreviation>
    </organization>
  </contributor>
  <language>en</language>
  <script>Latn</script>
</bibdata>