tulibraries / cob_datapipeline

Airflow Data Processing Pipeline for TUL Catalog on Blacklight Data
6 stars 0 forks source link

how to have airflow task running traject only log (not die) upon traject - solr HTTP response error #5

Closed cmharlow closed 1 year ago

cmharlow commented 5 years ago

We have document versioning on in Solr; this can sometimes cause Solr to respond 409 for a full reindex; in current process, doesn't stop the traject process, just throws errors in the logfile; but for Airflow sees that error code and considers the task failing; can the airflow script process become a wrapper around traject to swallow the error; and traject in that issue tries to retry individual + slows process.

cmharlow commented 5 years ago

@dkinzer have you encountered similar issues, and how do you have traject log the error but not kill the process?

relaxing commented 5 years ago

No, we'd normally want to die on Solr errors. This is only a concern for Error 409 - User version not high enough.

dkinzer commented 5 years ago

@cmharlow @relaxing I looked into this briefly and we can easily ignore all the errors with double pipe immediately after invoking the ingest command:

traject -c lib/traject/indexer_config.rb spec/fixtures/purchase_online_bibs.xml || echo $?

Sadly traject only throws 'exit 1' so we would need to fork traject so that it either throws a different exit error on different types of failures or have it ignore specific solr errors and not exit 1 on those.

I remember @relaxing and jrochkind where having a conversation with regards to that on the #traject channel in code4lib slack? Not sure where that lead to.