relaton / relaton-bipm

MIT License
2 stars 0 forks source link

(URGENT) Inconsistent file naming and resolution naming #22

Closed ronaldtse closed 2 years ago

ronaldtse commented 2 years ago

In relaton-data-bipm/data/cgpm/resolution , some files are not named correctly to the pattern yyyy-ii where yyyy is year and ii is the identifier.

e.g.

As we can see from the second issue, this is purely a data parsing issue in Relaton-BIPM.

@andrew2net can you please help fix this ASAP? Thanks.

andrew2net commented 2 years ago

@ronaldtse

  1. FYI: the cgpm/resolution/21.yaml now is cgpm/meetings/21.yaml the cgpm/resolution/3.yaml now is cgpm/meetings/3.yaml

  2. I didn't get why 21.yaml should be named using pattern yyyy-ii but 3.yaml using pattern yyyy?

  3. References for these documents are CGPM Meetings 21 and CGPM Meetings 3. Should the references be CGPM Meetings 1999-21 and CGPM Meetings 1901?

ronaldtse commented 2 years ago
  1. FYI: the cgpm/resolution/21.yaml now is cgpm/meetings/21.yaml the cgpm/resolution/3.yaml now is cgpm/meetings/3.yaml

Maybe there is a misunderstanding -- we have two classes here, "Meeting" and "Resolution".

The files I meant were:

  1. I didn't get why 21.yaml should be named using pattern yyyy-ii but 3.yaml using pattern yyyy?

For Resolutions, we name them "yyyy-ii.yaml". I think this is correct because it allows us to identify the year and the ID of the resolution.

For Meetings, we name them "yyyy.yaml" (yyyy is year of meeting) or just "xx.yaml" (xx is number of meeting)? I'm not sure which one is better.

  1. References for these documents are CGPM Meetings 21 and CGPM Meetings 3. Should the references be CGPM Meetings 1999-21 and CGPM Meetings 1901?

We have two kinds of objects here: Meeting and Resolution.

Meetings:

Resolutions (syntax 1):

Resolutions (syntax 2):

andrew2net commented 2 years ago

Resolutions (syntax 2):

@ronaldtse do we really need the syntax 2? We have to create duplicated files with names matched to the syntax or create mapping file.

ronaldtse commented 2 years ago

Why not use an index? We should really not tie the file name pattern with the software.

andrew2net commented 2 years ago

@ronaldtse using index slows down fetching documents because it needs two HTTP request for each document. Maybe we need to consider caching indexes. I suggest to use singleton to keep index in memory. What do you think?

ronaldtse commented 2 years ago

@andrew2net I think having index on disk (cached) is a reasonable compromise.

andrew2net commented 2 years ago

@ronaldtse in case the relaton is run in AWS Lambda it's impossible to use container's file system to save index. Should we use S3 in the case?

ronaldtse commented 2 years ago

@andrew2net Ah I was thinking about local. Now the challenge about 2 requests using the index makes more sense. I thought Lambda would still support caching, but I guess it would go away at the next run.

For Relaton API, maybe we should have an S3 that mirrors all the Git repos offline...?

andrew2net commented 2 years ago

@ronaldtse we can detect if a gem is run in Lambda then use S3 else use local fs.

andrew2net commented 2 years ago

Resolutions (syntax 2):

@ronaldtse some meetings have parts. For example CIPM Meeting 101-1. Should the syntax 2 looks like CIPM Decision 101-1-01 or just CIPM Decision 101-01? The parts have end-to-end numbering, so the second version will produce unique references.

ronaldtse commented 2 years ago

Good find. I think the resolutions should be cited as "CIPM Meeting 101-01". The sub-meetings should not be used in resolutions/decision/etc numbering.

Can you help document these decisions in the README? Thanks.

andrew2net commented 2 years ago

@ronaldtse I think we need to add to the index the documents from the bipm-si-brochure dataset.

  1. Do we need to convert all the documents from the dataset's site/documents folder?
  2. Some of the documents have English and French files. Do we need to compose them into one Relaton item?
  3. What are the *.presentation.xml files in the site/documents folder?
  4. The site/documents/sib-a4-en.xml have a document identifier BIPM. Shouldn't it be something like BIPM SIB-A4?
ronaldtse commented 2 years ago

@andrew2net yes indeed. We need to add the index from those documents. But let's do that in a new issue?

  1. Do we need to convert all the documents from the dataset's site/documents folder?

Yes.

  1. Some of the documents have English and French files. Do we need to compose them into one Relaton item?

Yes.

  1. What are the *.presentation.xml files in the site/documents folder?

Those documents are the Metanorma Presentation XML files. But those are not supposed to be in Git main? They are only supposed to be in gh-pages branch.

  1. The site/documents/sib-a4-en.xml have a document identifier BIPM. Shouldn't it be something like BIPM SIB-A4?

I will have to check and get back to you...