kermitt2 / biblio-glutton

A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
125 stars 16 forks source link

Crossref Index: [parse_exception] request body is required #40

Closed cverluise closed 5 years ago

cverluise commented 5 years ago

Hello,

Thanks for the great work !

I am trying to build the ES index on a AWS EC2 instance.

After ingesting ~29 millions records, the programme raised a [parse_exception] request body is required.

```sh Loaded 29485000 records in 8941.782 s (12.003649109329235 record/s) Loaded 29486000 records in 8941.819 s (12.090436464756378 record/s) Loaded 29487000 records in 8942.022 s (12.126944858781727 record/s) Loaded 29488000 records in 8942.14 s (12.175967076185024 record/s) Bulk is rejected... let's medidate 10 seconds about the illusion of time and consciousness Waiting for 10 seconds Bulk is rejected... let's medidate 10 seconds about the illusion of time and consciousness Waiting for 10 seconds bulk is finally ingested... Loaded 29489000 records in 8987.688 s (21.52018593440647 record/s) bulk is finally ingested... Loaded 29490000 records in 8988.32 s (21.2630236019562 record/s) Bulk is rejected... let's medidate 10 seconds about the illusion of time and consciousness Waiting for 10 seconds Bulk is rejected... let's medidate 10 seconds about the illusion of time and consciousness Waiting for 10 seconds [parse_exception] request body is required /home/ubuntu/biblio-glutton/matching/main.js:357 throw err; ^ Error: [parse_exception] request body is required at respond (/home/ubuntu/biblio-glutton/matching/node_modules/elasticsearch/src/lib/transport.js:308:15) at checkRespForFailure (/home/ubuntu/biblio-glutton/matching/node_modules/elasticsearch/src/lib/transport.js:267:7) at HttpConnector. (/home/ubuntu/biblio-glutton/matching/node_modules/elasticsearch/src/lib/connectors/http.js:166:7) at Unzip.wrapper (/home/ubuntu/biblio-glutton/matching/node_modules/lodash/lodash.js:4929:19) at emitNone (events.js:111:20) at Unzip.emit (events.js:208:7) at endReadableNT (_stream_readable.js:1064:12) at _combinedTickCallback (internal/process/next_tick.js:138:11) at process._tickCallback (internal/process/next_tick.js:180:9) ```

After that, I see that the index has been partially built.

Do you have any idea of how I can fix the issue ? If yes, I would also like to know if it is possible not to restart from scratch (e.g, start indexing directly the remaining records) ?

Reproduce issue

$ cd matching/
$ npm install # host='localhost:9200' in my_connection.json
$ node main -dump ~/data/2017-03-21crossref-works.json.xz index

System

- AWS EC2 t2.medium - Elastic search latest - java: - openjdk version "1.8.0_222" - OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10) - OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode) - elastic search: - "number" : "7.3.1", - "build_flavor" : "default", - "build_type" : "deb", - "build_hash" : "4749ba6", - "lucene_version" : "8.1.0", - "minimum_wire_compatibility_version" : "6.8.0", - "minimum_index_compatibility_version" : "6.0.0-beta1"

Thanks !

cverluise commented 5 years ago

Edit:

I think that this issue might be closed unless anyone has something to add to this issue which is only indirectly related to biblio-glutton.