rism-digital / muscat

🗂️ A Rails application for the inventory of handwritten and printed music scores
http://muscat-project.org
34 stars 16 forks source link

Source import tracking #350

Closed xhero closed 8 years ago

xhero commented 8 years ago
xhero commented 8 years ago

For the very log 240 $m: if it is needed as-is we can increase the dimension of std_title to a TEXT field

lpugin commented 8 years ago

I would pretty truncate std_title. It does not make sense to have it that long.

xhero commented 8 years ago

Ok for me - maybe we can pretty truncate the single pieces that make it up? Like Masses (v4); 2vl 3vc etc etc etc etc...; other stuff etc... instead of Masses (4v); impressively long list of instruments...

lpugin commented 8 years ago

Yes, this would be better

HirschSt commented 8 years ago

Truncate is ok, these records should been corrected by hand later

xhero commented 8 years ago

It seems @HirschSt comment was deleted when I clicked the ckeckbox, sorry!

xhero commented 8 years ago

@HirschSt Is the new data for June 01 already available? I would like to run an import

HirschSt commented 8 years ago

@xhero I am working on it, but latest fixes from today would be available tomorrow at the earliest (dataset > 20160601)

xhero commented 8 years ago

Could you also drop the offending 700s without $0? they still create problems when loading/saving/reindexing

HirschSt commented 8 years ago

Ok, I will drop 700 and 710 from sources if $0 missing

xhero commented 8 years ago

@HirschSt could you also add the titles with ""no-publishing"? thanks!

HirschSt commented 8 years ago

@xhero "non-publishing" records are included in dataset >= 20160601

xhero commented 8 years ago

There are still records with the DE-588a IDS in 700, which still created problems. I added a hard-coded fix in to_internal, also for 852 $x

HirschSt commented 8 years ago

852$x will be fixed in data and in the export

xhero commented 8 years ago

Data import does not show any errors anymore - most of these seem to be resolved

lpugin commented 8 years ago

Well done. They are a few remaining checkboxes...

HirschSt commented 8 years ago

I am not sure about dropping 856 ... will it be restored in the export, and where are the images hosted? The remaining three sources still are needing more investigation and probably should be fixed in the application

xhero commented 8 years ago

Indexing now is almost perfect! only two records resist:

They both have 710 $0 = 0

@xhero SH: will be fixed with the export >=20160608

xhero commented 8 years ago

Oops, two more:

Not bad, out of 1532188 total reindexed items!

xhero commented 8 years ago

Last check: Import signals these tags as missing:

Subfield 031 $2 missing in the marc configuration
Subfield 594 $a missing in the marc configuration
Subfield 240 $n missing in the marc configuration
Subfield 710 $g missing in the marc configuration
Subfield 772 $t missing in the marc configuration

I checked all of them and they are either dropped or are removed in to_internal, so for me it is OK. I just need this to be double-checked so we have no surprises.

lpugin commented 8 years ago

They will all need to be added back in to_external (together with control fields), so keep the list in the appropriate ticket.

HirschSt commented 8 years ago

I was pretty sure I had added 710$g to sources-conf, so I will add it back (with 54f960a). Other missing tags ok with me too.

HirschSt commented 8 years ago

130 comes from moving 240 to 130 because of the collection record type, so there is no $0; 950000002 needs updating by hand

HirschSt commented 8 years ago

950000002 is corrected in dataset >= 20160609