tarekmed / gbif-ecat

Automatically exported from code.google.com/p/gbif-ecat
0 stars 0 forks source link

Citation strings too long to be indexed #68

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
The citation strings inserted can get too large for a btree index and cause a 
postgresl exception during insert.
Consider replacing the index with an index on the md5 hash of the string 
instead!

17:40:26.120 Failed to import taxon 4466976. Name: Animalia Uetz P.. 
Error:insert into citation (citation) values (?)
org.gbif.ecat.jdbc.DataAccessException: insert into citation (citation) values 
(?)
    at org.gbif.ecat.jdbc.JdbcTemplate.executeInsert(JdbcTemplate.java:276) ~[clb.jar:na]
    at org.gbif.checklistbank.service.impl.PgSqlBaseService.executeInsert(PgSqlBaseService.java:232) ~[clb.jar:na]
    at org.gbif.checklistbank.service.impl.CitationServicePgSql.citationToId(CitationServicePgSql.java:52) ~[clb.jar:na]
    at org.gbif.checklistbank.imports.ChecklistImportPgSql.fillUsage(ChecklistImportPgSql.java:450) [clb.jar:na]
    at org.gbif.checklistbank.imports.ChecklistImportPgSql.insertStarRecords(ChecklistImportPgSql.java:1146) [clb.jar:na]
    at org.gbif.checklistbank.imports.ChecklistImportPgSql.importData(ChecklistImportPgSql.java:550) [clb.jar:na]
    at org.gbif.checklistbank.service.impl.ChecklistImportServicePgSql.importChecklist(ChecklistImportServicePgSql.java:134) [clb.jar:na]
    at org.gbif.checklistbank.cli.CommandLineInterpreter.doImportResources(CommandLineInterpreter.java:401) [clb.jar:na]
    at org.gbif.checklistbank.cli.CommandLineInterpreter.main(CommandLineInterpreter.java:161) [clb.jar:na]
Caused by: org.postgresql.util.PSQLException: ERROR: index row size 3312 
exceeds maximum 2712 for index "citation_citation_idx"
  Hint: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text 
indexing.
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2077) ~[clb.jar:na]
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1810) ~[clb.jar:na]
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) ~[clb.jar:na]
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:498) ~[clb.jar:na]
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:386) ~[clb.jar:na]
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:332) ~[clb.jar:na]
    at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeUpdate(NewProxyPreparedStatement.java:105) ~[clb.jar:na]
    at org.gbif.ecat.jdbc.JdbcTemplate.executeInsert(JdbcTemplate.java:268) ~[clb.jar:na]
    ... 8 common frames omitted

Original issue reported on code.google.com by wixner@gmail.com on 30 Sep 2011 at 7:58

GoogleCodeExporter commented 8 years ago
using faster hashtext() function instead of md5()

Original comment by wixner@gmail.com on 24 Oct 2011 at 12:45

GoogleCodeExporter commented 8 years ago

Original comment by wixner@gmail.com on 25 Oct 2011 at 11:34