Open tiborsimko opened 9 years ago
Here are some new use cases where we have a similar problems:
_Records are partial indexes for new uploads._ A search in the specific index will not find the title, authors or imprint. A search in the global index will find author, imprint and 035.
The indexes is structured in the following way: Title 245% Author: 100%, 700% Imprint: 264% miscellaneous: 035%, 26%, 100a,700a (default settings)
echo "SELECT job_date,affected_fields FROM hstRECORD WHERE id_bibrec=862175" | /opt/invenio/bin/dbexec
job_date affected_fields
2015-11-23 13:27:55 003__%,005__%,008__%,010__%,020__%,035__%,040__%,049__%,05000%,08200%,1001_%,24510%,250__%,264_1%,300__%,336__%,337__%,338__%,504__%,650_0%,7001_%
2015-11-23 13:29:05 005__%,948__%
2015-11-23 13:29:35 005__%,980__%
2015-11-23 13:30:10 005__%,998__%
Relevant metadata:
<datafield tag="100" ind1="1" ind2=" ">
<subfield code="a">Cox, Michael M.</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(OCoLC)905380069</subfield>
</datafield>
<datafield tag="245" ind1="1" ind2="0">
<subfield code="a">Molecular biology :</subfield>
<subfield code="b">principles and practice /</subfield>
<subfield code="c">Michael M. Cox., University of Wisconsin-Madison, Jennifer A. Doudna, University of California, Berkeley, Michael O"Donnell, The Rockefeller University.
</subfield>
</datafield>
<datafield tag="264" ind1=" " ind2="1">
<subfield code="a">New York :</subfield>
<subfield code="b">W.H. Freeman & Company, a Macmillan Education Imprint,
</subfield>
<subfield code="c">[2015]</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="a">Doudna, Jennifer A.</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="a">O"Donnell, Michael</subfield>
<subfield code="c">(Biochemist)</subfield>
</datafield>
If we re-index everything, it becomes searchable.
_Records are not indexed after modifications_ A meeting name has been updated from - to
1112_ $$aInternational Telecommunication Conference$$d(1947 :$$cAtlantic City, United States of America)
1112_ $$aInternational Telecommunications Conference$$d(1947 :$$cAtlantic City, United States of America)
It is the word Telecommunications which is modified.
It is not searchable in a specific meeting name index (111%), but in the global search, which covers the miscellaneous index (11%).
echo "SELECT job_date,affected_fields FROM hstRECORD WHERE id_bibrec=12079" | /opt/invenio/bin/dbexec
job_date affected_fields
2015-08-18 18:44:42 000__%,005__%,008__%,1112_%,24510%,24602%,260__%,300__%,500__%,5050_%,518__%,650_4%,7102_%,7112_%,7670_%,8528_%,902__%,980__%
2015-10-08 18:19:33 000__%,005__%,008__%,1112_%,24510%,24602%,260__%,300__%,500__%,50500%,518__%,650_4%,7102_%,7112_%,7670_%,8528_%,902__%,980__%
2015-10-14 16:01:33 005__%,8528_%
2015-10-14 16:07:24 005__%,8528_%
2015-10-14 16:09:09 005__%,8528_%
2015-10-16 10:17:00 005__%,8528_%
2015-11-18 14:56:37 005__%,1112_%
There seem to be an incremental indexing leak trouble when a record is cloned (hence uploaded via
bibupload -r
) and some of the tags to index are defined with a wildcard on indicator positions (e.g.245%a
').(When a record is inserted, things seem to work, due to different treatment of
affected_fields
. But see also #2693.)Here is how to reproduce the problem.
Let us recreate fresh demo site and index any pending OAI repository jobs:
Let us simulate cloning of a record with default setup first:
and let us see if incremental indexing works:
Looks good: not all field indexes were updated, only the ones that the record itself actually contained, which corresponds to the list of
affected_fields
above.However, let's see what happens when one MARC tag is defined via wildcard:
and let's simulate cloning another record again and see if incremental indexing works:
Oops! Only some indexes are updated, notably
title
(id=8
) andexacttitle
(id=19
) were not updated, now that they work on245%a
rather than245__%
.