ualbertalib / avalon

University of Alberta's Media Repository based on Avalon
Apache License 2.0
2 stars 2 forks source link

Avalon v6.5 test: investigate wave form create failures on five test objects #656

Closed jefferya closed 3 years ago

jefferya commented 4 years ago

As part of the upgrade to v6.5, a waveform is created for each of the pre-existing objects via scripts/waveform_backfill.rb [1].

When run on the staging server containing several test objects, there are 5 media files that fail. This ticket records the investigation. The details: https://avalon6-test.library.ualberta.ca/jobs/failed

An example:

Worker
    avtest02.library.ualberta.ca:26491 on default at 1 day ago
    Retry or Remove
Class
    WaveformJob (via ActiveJob) 
Arguments

    ---
    job_class: WaveformJob
    job_id: 70f25386-e39e-4c92-8cd0-649acdef016e
    provider_job_id: 
    queue_name: default
    priority: 
    arguments:
    - tm70mv18h
    executions: 0
    locale: en

Exception
    WaveFile::InvalidFormatError
Error
    Does not appear to be a valid Wave file

[1] Note: to rerun waveform_backfill after changes to Avalon (i.e., new objects added) the following file needs to be deleted: scriptdata/waveform_backfill.txt as it is generated one time from the objects stored in Solr and reused to speedup subsequent executions of the waveform_backfill.

jefferya commented 4 years ago

All masterfiles reporting problems play in the Avalon player and have sound.

tm70mv18h f7623c56g 7d278t01v 5999n3367 c247ds08x

If the below hypothesis is correct, the waveform creation problem will affect every master file that has been moved within the dropbox (including renaming the parent/ancestor directory or file since ingest).

Diagnotics:

The waveform starts with a list of master file ids then get the URI [1]:

uri = file_uri(master_file) || playlist_url(master_file)

If the file_uri is successful, meaning the file uri in the fedora object exists, the resulting ffmpeg command uses the file path [2]

However, if the file path is no longer valid, the streaming server is queried. This is the case for the 5 Does not appear to be a valid Wave file failures [3]. Digging further, running ffmpeg [2] from the command line yields the following error:

/usr/bin/ffmpeg -headers $'Referer: https://avalon6-test.library.ualberta.ca/
' -i 'https://avstream-test.library.ualberta.ca:8443/avalon/_definst_/mp4:19316104-f3f0-42da-b0db-6d14172c24e4/f1651f68-a8d7-40b7-bc58-be4386e83b26/georges_melies_trip_moon_1902.mp4/playlist.m3u8?token=03de6fe02aafbd318dcec82014f5c60037a60f44' -f wav -ar 44100 -

ffmpeg version n4.2.3 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 4.8.5 (GCC) 20150623 (Red Hat 4.8.5-39)
  configuration: --prefix=/home/tlee/rpmbuild/BUILD/ffmpeg/ffmpeg_build --extra-cflags=-I/home/tlee/rpmbuild/BUILD/ffmpeg/ffmpeg_build/include --extra-ldflags=-L/home/tlee/rpmbuild/BUILD/ffmpeg/ffmpeg_build/lib --extra-libs='-lpthread -lm' --bindir=/home/tlee/rpmbuild/BUILD/ffmpeg/bin --pkg-config-flags=--static --enable-gpl --enable-nonfree --enable-libfdk_aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
https protocol not found, recompile FFmpeg with openssl, gnutls or securetransport enabled.
https://avstream-test.library.ualberta.ca:8443/avalon/_definst_/mp4:19316104-f3f0-42da-b0db-6d14172c24e4/f1651f68-a8d7-40b7-bc58-be4386e83b26/georges_melies_trip_moon_1902.mp4/playlist.m3u8?token=03de6fe02aafbd318dcec82014f5c60037a60f44: Protocol not found
Did you mean file:https://avstream-test.library.ualberta.ca:8443/avalon/_definst_/mp4:19316104-f3f0-42da-b0db-6d14172c24e4/f1651f68-a8d7-40b7-bc58-be4386e83b26/georges_melies_trip_moon_1902.mp4/playlist.m3u8?token=03de6fe02aafbd318dcec82014f5c60037a60f44?

Impact

[1] https://github.com/ualbertalib/avalon/blob/9d9d21ad8ee08da4c6ceee694f5016d5b189d0ea/app/jobs/waveform_job.rb#L25

[2] https://github.com/ualbertalib/avalon/blob/9d9d21ad8ee08da4c6ceee694f5016d5b189d0ea/app/services/waveform_service.rb#L49

[3]

Error performing WaveformJob (Job ID: 5d86374f-2300-4936-a160-31d6fc848590) from Resque(default) in 200.72ms: WaveFile::InvalidFormatError (Does not appear to be a valid Wave file):
/var/www/avalon6/vendor/bundle/ruby/2.5.0/gems/wavefile-1.0.1/lib/wavefile/reader.rb:46:in `rescue in initialize'
/var/www/avalon6/vendor/bundle/ruby/2.5.0/gems/wavefile-1.0.1/lib/wavefile/reader.rb:43:in `initialize'
/var/www/avalon6/app/services/waveform_service.rb:56:in `new'
seanluyk commented 4 years ago

@jefferya just wondering if any of these test objects are videos? I wonder if a waveform is only created for audio objects

jefferya commented 4 years ago

@seanluyk Unfortunately, all have audio. I've e-mailed the group a synopsis of my investigation.

jefferya commented 4 years ago

A script to predict when the waveform backfill will have problems related to masterfile location disconnect.

require 'json'
require 'net/http'
require 'uri'

uri_str=ENV['SOLR_URL']+"select?q=has_model_ssim:MasterFile&fl=id,file_location_ssi&rows=1000000&wt=json&sort=file_location_ssi%20asc,id%20asc"

uri = URI.parse(uri_str)
response = Net::HTTP.get_response(uri)

master_file_paths = response.body
result = JSON.parse(master_file_paths)
result["response"]["docs"].each do |doc|
  id = (doc['id']).to_s
  path = (doc['file_location_ssi']).to_s
  puts "#{id}, #{path}" unless File.exist?(path)
end