inspirehep / invenio

Invenio digital library software, INSPIRE OPS version
http://invenio-software.org/
GNU General Public License v2.0
3 stars 10 forks source link

bibxxx tables out of sync on bibupload --delete #388

Closed tsgit closed 7 years ago

tsgit commented 7 years ago

When a subfield from one occurrence of a repeatable tag gets deleted (and in effect the whole tag), the determination of affected_tags is wrong when there are remaining tags of same kind

https://github.com/inspirehep/invenio/blob/prod/modules/bibupload/lib/bibupload.py#L395-L417

tsgit commented 7 years ago

I suggest using dictdiffer to figure out what changed

In [33]: oldtags = defaultdict(list)  

In [34]: for k,v in y.iteritems():
    ...:     for f in v:
    ...:         oldtags[k].append(f[:4])

In [35]: newtags = defaultdict(list)  

In [36]: for k,v in x.iteritems():
    ...:     for f in v:
    ...:         newtags[k].append(f[:4])

In [37]: list(diff(oldtags,newtags))
Out[37]: 
[('add',
  '700',
  [(181, ([('q', 'de Vries-Uiterweerd, Garmt')], ' ', ' ', ''))]),
 ('change',
  ['005', 0],
  (([], ' ', ' ', '20170314135428.0'), ([], ' ', ' ', '20160422145751.0')))]
tsgit commented 7 years ago

https://github.com/inspirehep/invenio/pull/391

tsgit commented 7 years ago

fixed