Closed andrew2net closed 1 month ago
The DocumentIdentifier#id
returned String
before and it returns Hash
now. The GHA fails for this reason. We shouldn't change existed methods` behavior. @mico we need to rollback the changes made here as soon as possible.
That you need to do here is:
index-v2.yaml
id
is used as a key in index
https://github.com/relaton/relaton-ccsds/blob/0015be2ba88e6fdb170873962b618048ecada84d/lib/relaton_ccsds/data_fetcher.rb#L120
For the new index you need to transform the id
String -> Hash using pubid-ccsd, then use it as a key.index.search text
just compare text
with string id
in the index. We need to replace it with something like:
pubid_ref = PubidCcsds::Identifier.parse(text)
rows = index.search do |row|
pubid = PubidCcsds::Identifier.new(row[:id])
pubid == pubid_ref
end
The goal is to get best result with incomplete references. For example there are two editions CCSDS 103.0-B-1
and CCSDS 103.0-B-2
. If we use reference without edition CCSDS 103.0-B
we should get two rows form the index and then pick up the last one.
@andrew2net @ronaldtse
pubid_ref = PubidCcsds::Identifier.parse(text) rows = index.search do |row| pubid = PubidCcsds::Identifier.new(row[:id]) pubid == pubid_ref end
I believe search should be implemented in Relaton::Index::Type
One case might be an adding functionality to use custom search function, for example implementing Relaton::Index::Type#set_custom_search_function(custom_search_function)
method, so it will make it possible to assign a custom search method to index assigned to provider:
index = Relaton::Index.find_or_create :ccsds, url: "#{GHURL}index-v1.zip", file: INDEX_FILE
index.set_custom_search_function(
-> (row) { PubidCcsds::Identifier.new(row[:id]) == PubidCcsds::Identifier.parse(text) }
)
Another case is implementation Relaton::Index::Type
type for each identifiers provider (CCSDS in our case) where we implement separate search function.
Another idea is to implement pubid
usage in Relaton::Index::Type by default, because we are planning use pubid everywhere in the future.
@mico what are the benefits of using set_custom_search_function
over search
with block?
Using a block with search achieves exactly the same effect, right?
index = Relaton::Index.find_or_create :ccsds, url: "#{GHURL}index-v1.zip", file: INDEX_FILE
index.search do |row|
PubidCcsds::Identifier.new(row[:id]) == PubidCcsds::Identifier.parse(text)
end
(This really is just the Enumerable#detect/select
methods?)
@ronaldtse We need to use block because we need to create Pubid object inside the block. The relaton-index
doesn't know what type of Pubid we need to create, and I don't think we should implement this logic in it.
Also in other Relaton flafor gem we need to exclude some ID elements from comparation like edition, part etc. For example when we need to get all parts doc we should ignore parts in the block.
Using a block with search achieves exactly the same effect, right?
@ronaldtse Yes, but do you think it's ok to use block every time when I just need to search by identifier's string (most common case):
index.search { |row| PubidCcsds::Identifier.new(row[:id]) == PubidCcsds::Identifier.parse(text) }
Instead of just:
index.search(text) }
I'm fine with that, just trying to cover obvious scenarios without code duplication when possible.
@ronaldtse Yes, but do you think it's ok to use block every time when I just need to search by identifier's string (most common case):
@mico Usually, we use search once within a gem, in cases where it is used more than once we need to exclude some parts of ID form comparison. So we need to use a block anyway.
Actually I believe that having a simpler method for people to use the index is a good thing. Let’s allow a text search and if people want to do complex comparisons they can use a block?
BTW, a search by default means the results returned is in an array.
Actually I believe that having a simpler method for people to use the index is a good thing. Let’s allow a text search and if people want to do complex comparisons they can use a block?
I agree, but it's not related to this issue. We need to use block in relaton-ccsds because we need to disable edition if a given reference doesn't have an edition, so we need to find all editions of the document.
@mico why can't we just use search with block here?
The class parses all the CCSDS docs https://github.com/relaton/relaton-ccsds/blob/main/lib/relaton_ccsds/data_parser.rb and save the in the https://github.com/relaton/relaton-ccsds/blob/main/lib/relaton_ccsds/data_fetcher.rb all IDs are in the index file https://github.com/relaton/relaton-data-ccsds/blob/main/index-v1.yaml
We need to parse all the IDs in the document parser and generate a new index with IDs as Hash instead of String. For example, the current index contains:
The new index should be like:
These ID parts are an example. I don't know what output the
Pubid::Ccsds::Identifier.parse(id).to_h
produces. So whatever Hash object it produces, we need to use it as an id value.When the new index is created we need to update searching documents to use the new index.