sul-dlss / dor-services-app

A Rails application exposing Digital Object Registry functions as a RESTful HTTP API
https://sul-dlss.github.io/dor-services-app/
Other
3 stars 2 forks source link

given a list of ckeys, write code to pull marc from Symphony and marc from Folio and compare them. #4348

Closed ndushay closed 1 year ago

ndushay commented 1 year ago

We want to be able to test that the MARC bib data we get from Folio is the same as we get from Symphony to learn whether our Marc -> cocina mappings (and cocina -> marc) will work as is for Folio.

Initially, we will compare a list of ckeys provided by @arcadiafalcone; we might end up comparing random ckeys culled from SDR data, and it is very vaguely possible we'll be scaling this up further. But for now, having the facility to compare Folio and Symphony marc from a provided list of ckeys is sufficient.

The work for this ticket:

  1. code that can pull a marc record from Folio (it can be marc.json and Laura has figured out one way to do this). Possibly being able to pull marc records for a batch of HRIDs would be good, but that's just a performace consideration. Note that we think the instance HRID can be assumed to be the ckey with a prefix of "a" but that might need to be adjusted.
  2. the Folio provided marc should be loaded into a ruby-marc object
  3. using SymphonyReader or code copied from it, pull the corresponding record from Symphony and load it into a ruby-marc object
  4. compare the two ruby-marc objects. This could be comparing json output from ruby-marc, or marcxml, or marc21. Have a way to indicate "same" or "different" and if different, indicate the specific differences.
  5. be able to do the above for a list of ckeys.

At time of ticket writing, folio-test is NOT upgraded to nolana and has Symphony data (although it may be slightly stale). folio-dev is already upgraded to nolana, but its unclear if the data loading has clean instance records from Symphony data. Thus, it may make sense to have settings.yml to select which folio endpoint to use. Do NOT put the Folio account password into git.

At time of writing this ticket, folio gem isn't ready yet; this ticket should not wait for folio gem to be ready.

It may make sense for DSA to have a folio-migration folder, like it had a folder for cocina related migrations.

arcadiafalcone commented 1 year ago

Sample catkeys for testing: 34844 218299 351083 361714 364436 367268 373820 463103 471379 488345 494270 497429 593373 783855 1202286 1387594 1828934 2180515 2466816 3010070 3293068 3815475 4180487 4336012 4605212 4653912 4821648 5570415 5634120 5992486 6538442 6671606 6712419 6766105 8730782 8810055 9066991 9272824 9608397 9803970 10113876 10134735 10143918 10244929 10366569 10505240 10532136 10561302 10648654 10652597 10742934 11351729 11872120 12065003 12117489 12133998 12260619 12372101 12393790 12943972 13162356 13257654 13285529 13355277 13432978 13461627 13478155 13538744 13677103 13757919 13810428 14260109 14266128 14283538 14292572 14306240 14317201 14366611

jmartin-sul commented 1 year ago

above catkey list as a ruby string array, for easy copypasta 🍝:

%w[34844 218299 351083 361714 364436 367268 373820 463103 471379 488345 494270 497429 593373 783855 1202286 1387594 1828934 2180515 2466816 3010070 3293068 3815475 4180487 4336012 4605212 4653912 4821648 5570415 5634120 5992486 6538442 6671606 6712419 6766105 8730782 8810055 9066991 9272824 9608397 9803970 10113876 10134735 10143918 10244929 10366569 10505240 10532136 10561302 10648654 10652597 10742934 11351729 11872120 12065003 12117489 12133998 12260619 12372101 12393790 12943972 13162356 13257654 13285529 13355277 13432978 13461627 13478155 13538744 13677103 13757919 13810428 14260109 14266128 14283538 14292572 14306240 14317201 14366611]
jmartin-sul commented 1 year ago

see also #4344

jmartin-sul commented 1 year ago

recording some post-standup discussion for future reference: @peetucket discovered that the raw JSON structure that Symphony and Folio give back from a REST call for a given catkey/hrid's marc differs somewhat significantly between the two systems; however, when fed to the ruby-marc gem, the gem successfully parses the JSON into a MARC::Record object, and when turning those objects back into hashes again for comparison, the only differences detected are actually semantically meaningful differences, i.e. field order and field content. in other words, despite the differing structure of the JSON results, it seems that they do represent info that's mostly semantically equivalent, and the ruby-marc gem seems to successfully normalize the structures for us 🎉