k-int / gokb-phase1

Original GOKb repo - Moving to https://github.com/openlibraryenvironment/gokb
http://www.gokb.org
Other
11 stars 5 forks source link

TIPPs are not recognized on ingest #482

Closed jhsolomon closed 8 years ago

jhsolomon commented 8 years ago

I am still getting many review tasks for missing TIPPS in this package: Brill: Brill Online Journals: 20160310

Possibly related to BOM discussion?

ianibo commented 8 years ago

Heya - can you point me at a copy of the file please, and just confirm this is via refine or data loader?

jhsolomon commented 8 years ago

@ianibo Here's the file: https://drive.google.com/open?id=0B9YT8kJnUu_tRkd2WDIwdEtocEU

This was uploaded through Refine by one of our data loaders. Thanks!

ianibo commented 8 years ago

Something going horribly wrong with this one -- two titles http://test-gokb.kuali.org/gokb/resource/show/org.gokb.cred.TitleInstance%3A320648#identifiers and http://test-gokb.kuali.org/gokb/resource/show/org.gokb.cred.TitleInstance%3A291605#identifiers

for the same title (First one in the file). Both are in the brill journal collection. Only difference is that one has a brill title_id and the other does not.

Although the identifier values are the same, I suspect that the underlying identifiers are with and without hyphens. We added code to the display part to add a hypen in when missing from an isxn.

We now have normalised identifier matching, so something has clearly gone wrong in the refine stream. I'll discuss with @sosguthorpe and update shortly.

ianibo commented 8 years ago

More info :: this taken from test -- issue with generated normalised identifiers. Going to need a hefty data clean - but I think clearning down the norm identifier col on live as a part of the next update will clear this. Going to test by resetting test db to a copy of live and updating -- if thats OK with GoKB Team.

mysql> select kbc_id, id_namespace_fk, id_value, kbc_normname from kbcomponent where id_value = '2352-3077'; +--------+-----------------+-----------+----------------+ | kbc_id | id_namespace_fk | id_value | kbc_normname | +--------+-----------------+-----------+----------------+ | 291603 | 3 | 2352-3077 | issn:2352-3077 | | 320645 | 3 | 2352-3077 | 23523077 | +--------+-----------------+-----------+----------------+ 2 rows in set (0.24 sec)

mysql> select count(_) from kbcomponent where kbcnormname like 'issn%'; +----------+ | count() | +----------+ | 6479 | +----------+ 1 row in set (0.00 sec)

jhsolomon commented 8 years ago

thanks for the update. is there anything i can do to help?

On Mon, Mar 21, 2016 at 8:28 AM, Ian Ibbotson notifications@github.com wrote:

More info :: this taken from live -- issue with generated normalised identifiers. Going to need a hefty data clean :(

mysql> select kbc_id, id_namespace_fk, id_value, kbc_normname from kbcomponent where id_value = '2352-3077'; +--------+-----------------+-----------+----------------+ | kbc_id | id_namespace_fk | id_value | kbc_normname | +--------+-----------------+-----------+----------------+ | 291603 | 3 | 2352-3077 | issn:2352-3077 | | 320645 | 3 | 2352-3077 | 23523077 | +--------+-----------------+-----------+----------------+ 2 rows in set (0.24 sec)

mysql> select count(

_) from kbcomponent where kbc_normname like 'issn%'; +----------+ count(_)

+----------+ | 6479 | +----------+ 1 row in set (0.00 sec)

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/k-int/gokb-phase1/issues/482#issuecomment-199251247

Jennifer Solomon GOKb Editor, Acquisitions and Discovery North Carolina State University Libraries 919-515-2743 j kristen_wilson@ncsu.eduhsolomo@ncsu.edu

ianibbo commented 8 years ago

Actually, yeah.. Still waiting on the kuali team to allow me permission to update the security group, or for someone else to update the security group on my behalf for access to Live. If you could give them a gentle nudge that would be awesome :)

ty, e

Ian Ibbotson Director Knowledge Integration Ltd 35 Paradise Street, Sheffield. S3 8PZ T: 0114 273 8271 M: 07968 794 630 W: http://www.k-int.com

On 21 March 2016 at 12:29, jhsolomon notifications@github.com wrote:

thanks for the update. is there anything i can do to help?

On Mon, Mar 21, 2016 at 8:28 AM, Ian Ibbotson notifications@github.com wrote:

More info :: this taken from live -- issue with generated normalised identifiers. Going to need a hefty data clean :(

mysql> select kbc_id, id_namespace_fk, id_value, kbc_normname from kbcomponent where id_value = '2352-3077'; +--------+-----------------+-----------+----------------+ | kbc_id | id_namespace_fk | id_value | kbc_normname | +--------+-----------------+-----------+----------------+ | 291603 | 3 | 2352-3077 | issn:2352-3077 | | 320645 | 3 | 2352-3077 | 23523077 | +--------+-----------------+-----------+----------------+ 2 rows in set (0.24 sec)

mysql> select count(

_) from kbcomponent where kbc_normname like 'issn%'; +----------+ count(_)

+----------+ | 6479 | +----------+ 1 row in set (0.00 sec)

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/k-int/gokb-phase1/issues/482#issuecomment-199251247

Jennifer Solomon GOKb Editor, Acquisitions and Discovery North Carolina State University Libraries 919-515-2743 j kristen_wilson@ncsu.eduhsolomo@ncsu.edu

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/k-int/gokb-phase1/issues/482#issuecomment-199251522

jhsolomon commented 8 years ago

will do!

On Mon, Mar 21, 2016 at 9:51 AM, ianibbo notifications@github.com wrote:

Actually, yeah.. Still waiting on the kuali team to allow me permission to update the security group, or for someone else to update the security group on my behalf for access to Live. If you could give them a gentle nudge that would be awesome :)

ty, e

Ian Ibbotson Director Knowledge Integration Ltd 35 Paradise Street, Sheffield. S3 8PZ T: 0114 273 8271 M: 07968 794 630 W: http://www.k-int.com

On 21 March 2016 at 12:29, jhsolomon notifications@github.com wrote:

thanks for the update. is there anything i can do to help?

On Mon, Mar 21, 2016 at 8:28 AM, Ian Ibbotson notifications@github.com wrote:

More info :: this taken from live -- issue with generated normalised identifiers. Going to need a hefty data clean :(

mysql> select kbc_id, id_namespace_fk, id_value, kbc_normname from kbcomponent where id_value = '2352-3077'; +--------+-----------------+-----------+----------------+ | kbc_id | id_namespace_fk | id_value | kbc_normname | +--------+-----------------+-----------+----------------+ | 291603 | 3 | 2352-3077 | issn:2352-3077 | | 320645 | 3 | 2352-3077 | 23523077 | +--------+-----------------+-----------+----------------+ 2 rows in set (0.24 sec)

mysql> select count(

_) from kbcomponent where kbc_normname like 'issn%'; +----------+ count(_)

+----------+ | 6479 | +----------+ 1 row in set (0.00 sec)

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub < https://github.com/k-int/gokb-phase1/issues/482#issuecomment-199251247>

Jennifer Solomon GOKb Editor, Acquisitions and Discovery North Carolina State University Libraries 919-515-2743 j kristen_wilson@ncsu.eduhsolomo@ncsu.edu

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/k-int/gokb-phase1/issues/482#issuecomment-199251522

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/k-int/gokb-phase1/issues/482#issuecomment-199290652

Jennifer Solomon GOKb Editor, Acquisitions and Discovery North Carolina State University Libraries 919-515-2743 j kristen_wilson@ncsu.eduhsolomo@ncsu.edu

jhsolomon commented 8 years ago

The TIPP issue seems to be fixed, but for some reason this Refine project only ingested 93%. Brill_online_journals: 20160322

I've attached the package file.

brill_online_journals.txt

ianibo commented 8 years ago

TY for attaching file... working...

ianibo commented 8 years ago

Ok -- One issue from refine that I need assistance from Steve on

016-03-22 18:45:57,791 [ajp-bio-8009-exec-8] ERROR errors.GrailsExceptionResolver - NegativeArraySizeException occurred when processing request: [POST] /gokb/api/projectCheckin 151543 Stacktrace follows: 151544 java.lang.NegativeArraySizeException

Also

java.lang.NullPointerException: Cannot get property 'id' on null object 154387 at org.gokb.IngestService$_handleNonePresentTipps_closure6.doCall(IngestService.groovy:336) 154388 at org.grails.datastore.gorm.GormStaticApi.withTransaction(GormStaticApi.groovy:815) 154389 at org.grails.datastore.gorm.GormStaticApi.withTransaction(GormStaticApi.groovy:715)

Added some defensive code for this.

ianibo commented 8 years ago

Fix committed - will deploy update to test today

jhsolomon commented 8 years ago

Confirmed fix. The TIPP review tasks are now working properly.