tulibraries / cob_datapipeline

Airflow Data Processing Pipeline for TUL Catalog on Blacklight Data
6 stars 0 forks source link

Stop hitting solrcloud url using http. #105

Closed dkinzer closed 4 years ago

dkinzer commented 5 years ago

Looks like we hit solrcloud url with http vs. https which could be exposing our basic auth ... this happens because in connection.host we dont' specify http/http and instead add that programmatically.

cmharlow commented 5 years ago

i've been putting in the solrcloud 'host' locally with the 'https://' prepended. I presumed that worked against the issue you noted, but would be good to confirm (and get that in the airflow playbook connection info if so; see https://github.com/tulibraries/ansible-playbook-airflow/pull/121/files#diff-ee7b44b9b73d4c691251cbf5c8dff5c5R519 )

dkinzer commented 5 years ago

@cmharlow yup, that works.. it's just that in other places we generate it automatically in add http ... i think it makes sense to add it directly to the conn.host

cmharlow commented 5 years ago

cool, thanks for the confirmation. also double checked that its what airflow expect - https://github.com/apache/airflow/blob/master/airflow/hooks/http_hook.py#L62 seems so. good point to get it fixed everywhere though.

cmharlow commented 5 years ago

The airflow playbook sets the SolrCloud HTTP connection to use https (https in uri field & port 443 also filled in). Anywhere else we need to clean this up?

dkinzer commented 5 years ago

It's been fixed for az, and web content sc upgraded dags (by using tasks.get_solr_url method) ... so I guess we just have to remember to fix it for catalog indexing dags too:

https://github.com/tulibraries/cob_datapipeline/blob/38b546c56e804fbbc629560e040801e1e240f99c/task_ingestsftpmarc.py#L12

and

https://github.com/tulibraries/cob_datapipeline/blob/41c5c8c3ca9db5f0fc0bee8825b6540f7e12c850/task_ingestmarc.py#L12

cmharlow commented 4 years ago

I think this is now covered.