prior-art-archive / priorartarchive.org

Prior Art Archive Site
https://priorartarchive.org
GNU General Public License v2.0
3 stars 1 forks source link

FTPed files not showing up in search #59

Open metasj opened 4 years ago

metasj commented 4 years ago

This is a followup to #25 , in other contexts -- sometimes files successfully sent via FTP (and still on the FTP server) don't appear in search.

Sujith notes he saw errors in the GCP indexing service recently, and is forwarding. We need to see where ingestion failed, and to retest the process.

metasj commented 4 years ago

These files showed up in assets.priorarchive.org but were not indexed. Joel is currently looking at it.

slifty commented 4 years ago

It may be worth noting here which FTP server was used; that way we can be sure we're looking at the correct set of logs (I believe there is only one active any longer, but still)

joeltg commented 4 years ago

@slifty there are two ElasticBeanstalk Applications, one for v1 (called TikaServer) and one for v2 (called FileParser). I think the specific failures SJ is talking about are in v1

joeltg commented 4 years ago

the logs there show a lot of errors (including process out of memory), but none from 2019-10-25 when the 20 failures happened

slifty commented 4 years ago

Not sure if this is relevant; there are three errors in the CloudWatch entries for priorart-v2-prod-handle-new-sftp-file from 10/25 which look like this:

2019-10-25T13:57:42.297Z    0f6ab8ad-12c8-416b-995c-ab35c912f3ae    
{
    "errorMessage": null,
    "errorType": "NotFound",
    "stackTrace": [
        "Request.extractError (/var/runtime/node_modules/aws-sdk/lib/services/s3.js:565:35)",
        "Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)",
        "Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)",
        "Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:683:14)",
        "Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)",
        "AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)",
        "/var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10",
        "Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)",
        "Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:685:12)",
        "Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18)",
        "Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)",
        "Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:683:14)",
        "Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)",
        "AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)",
        "/var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10",
        "Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)",
        "Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:685:12)",
        "Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:116:18)",
        "callNextListener (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:96:12)",
        "IncomingMessage.onEnd (/var/runtime/node_modules/aws-sdk/lib/event_listeners.js:307:13)",
        "emitNone (events.js:111:20)",
        "IncomingMessage.emit (events.js:208:7)"
    ]
}
metasj commented 4 years ago

@slifty I just posted a new file; do you see any new entries?

metasj commented 4 years ago

Here's an error posted from /var/log/containers/server-68c637a59428-stdouterr.log (@slifty - on which server?)

Joel wrote about this:

This is the file-parser getting rejected from connecting to the IPFS node we’re running... a persistent problem / I don’t think that’s why the uploads were failing - it’s something that sometimes prevents file-parser from starting up when you reboot it


priorart-file-parser@0.1.0 start /usr/src/tika-server node index.js environment: production true Thu, 07 Nov 2019 20:54:14 GMT sequelize deprecated String based operators are now deprecated. Please use Symbol based operators for better security, read more at http://docs.sequelizejs.com/manual/tutorial/querying.html#operators at node_modules/sequelize/lib/sequelize.js:242:13 Listening on port 8080 Failed to connect to IPFS node SyntaxError: Unexpected token < in JSON at position 0 at JSON.parse () at streamToValue (/usr/src/tika-server/node_modules/ipfs-http-client/src/utils/stream-to-json-value.js:25:18) at concat (/usr/src/tika-server/node_modules/ipfs-http-client/src/utils/stream-to-value.js:12:22) at ConcatStream. (/usr/src/tika-server/node_modules/concat-stream/index.js:37:43) at emitNone (events.js:111:20) at ConcatStream.emit (events.js:208:7) at finishMaybe (/usr/src/tika-server/node_modules/readable-stream/lib/_stream_writable.js:620:14) at afterWrite (/usr/src/tika-server/node_modules/readable-stream/lib/_stream_writable.js:466:3) at _combinedTickCallback (internal/process/next_tick.js:145:20) at process._tickDomainCallback (internal/process/next_tick.js:219:9) npm ERR! code ELIFECYCLE npm ERR! errno 1 npm ERR! priorart-file-parser@0.1.0 start: node index.js npm ERR! Exit status 1 npm ERR! npm ERR! Failed at the priorart-file-parser@0.1.0 start script. npm ERR! This is probably not a problem with npm. There is likely additional logging output above. npm ERR! A complete log of this run can be found in: npm ERR! /root/.npm/_logs/2019-11-07T20_54_14_659Z-debug.log

metasj commented 4 years ago

joel updated the file parser to remove UL and elastic calls (parsed output should still hit the Kafka queue that goes to the v1 elastic instance).