sul-dlss / dlme-airflow

This is a new repository to capture the work related to the DLME ETL Pipeline and establish airflow
Apache License 2.0
1 stars 0 forks source link

sync_metadata task failing on some providers #97

Closed jacobthill closed 2 years ago

jacobthill commented 2 years ago

Not sure why this fails but when I add a fake data set in the appropriate directory it works so it could be that we need to add an on failure task that does an aws s3 cp or it could be related to the missing data file.

Affected collections:

There are similar issues with paths that affect other collections. These may or may not be relevant but are noted in comments below.

jacobthill commented 2 years ago

For ans it is failing with this error: The user-provided path /opt/***/working//ans/ans does not exist. Seems like we are adding an extra slash.

jacobthill commented 2 years ago

Same error on penn which is also a csv source The user-provided path /opt/***/working//penn/penn_egyptian does not exist.

jacobthill commented 2 years ago

yale babylon is also csv but is not failing at this step

jacobthill commented 2 years ago

aims is failing at assume role with this error

[2022-03-11 19:38:17,959] {subprocess.py:63} INFO - Running command: ['bash', '-c', '\n      temp_role=$(aws sts assume-role --role-session-name "DevelopersRole" --role-arn arn:aws:iam::418214828013:role/DevelopersRole) &&       export AWS_ACCESS_KEY_ID=$(echo $temp_role | jq .Credentials.AccessKeyId | xargs) &&       export AWS_SECRET_ACCESS_KEY=$(echo $temp_role | jq .Credentials.SecretAccessKey | xargs) &&       export AWS_SESSION_TOKEN=$(echo $temp_role | jq .Credentials.SessionToken | xargs) &&       aws s3 cp s3://dlme-metadata-dev/metadata/aims /opt/***/metadata//aims --recursive\n    ']

it looks like a similar problem here: aws s3 cp s3://dlme-metadata-dev/metadata/aims /opt/***/metadata//aims --recursive\n it has a double slash and a space after aims