samvera-deprecated / curation_concerns

A Hydra-based Rails Engine that extends an application, adding the ability to Create, Read, Update and Destroy (CRUD) objects (based on Hydra::Works) and providing a generator for defining object types with custom workflows, views, access controls, etc.
Other
15 stars 27 forks source link

Derivative temp file drops path prefix #740

Open mark-dce opened 8 years ago

mark-dce commented 8 years ago

Descriptive summary

The code in CurationConcerns::PersistDerivatives drops part of the path here: https://github.com/projecthydra-labs/curation_concerns/blob/master/app/services/curation_concerns/persist_derivatives.rb#L22-L26

Expected behavior

The file gets written to the path supplied in directives.fetch(:url)

Actual behavior

The root of the path gets truncated:

2.3.0 :019 > uri = URI('file://tmp/derivatives/th/83/kz/33/c-mp3.mp3')
 => #<URI::Generic file://tmp/derivatives/th/83/kz/33/c-mp3.mp3> 
2.3.0 :020 > uri.path
 => "/derivatives/th/83/kz/33/c-mp3.mp3"   ## expected /tmp/derivatives/...
2.3.0 :021 > uri.host
 => "tmp" 
2.3.0 :022 > uri.scheme
 => "file" 

Steps to reproduce the behavior

  1. Set a breakpoint at https://github.com/projecthydra-labs/curation_concerns/blob/master/app/services/curation_concerns/persist_derivatives.rb#L26
  2. Compare ui.path vs. directives.fetch(:url)
jcoyne commented 8 years ago

Yes, that uri ought to be file:///tmp/... or file://localhost/tmp/...

mark-dce commented 8 years ago

Here's what happens now: https://gist.github.com/mark-dce/79f9962462ed76376cd780de27fca38c

jcoyne commented 8 years ago

Probably introduced here: https://github.com/projecthydra-labs/curation_concerns/blob/master/app/models/concerns/curation_concerns/file_set/derivatives.rb#L51

jcoyne commented 8 years ago

What is your CurationConcerns.config.derivatives_path set to? I think it is expecting an absolute path, but you may have a relative path?

mark-dce commented 8 years ago

Yes - there were relative paths checked into the repos we're using: https://github.com/curationexperts/alexandria-v2/blob/master/config/application.yml.template#L7 Since there's not alot of documentation, the code should probably handle relative paths gracefully.

I just checked and the same issue occurs with absolute paths:

MARKs-MB.usinternet.com:7050 on DERIVATIVES at just now Retry or Remove
Class
 ActiveJob::QueueAdapters::ResqueAdapter::JobWrapper
Arguments
---
job_class: CreateDerivativesJob
job_id: 047dd857-cdd3-492c-a899-a00d36980b1b
queue_name: derivatives
arguments:
- 7h149p84n
- "/Users/mark/Documents/workspace/_no_backup/RubymineProjects/alexandria-v2/tmp/uploads/7h/14/9p/84/cusb-cyl4377a.wav"
locale: en
Exception
Errno::EACCES
Error
Permission denied @ dir_s_mkdir - /mark

i.e. still truncating the absolute path and trying to create /mark instead of using /Users/mark I'm running cc 0.10.0

jcoyne commented 8 years ago

Okay, I'm completely at a loss as to why that is happening with absolute paths. It makes sense why it's happening with relative paths though.

Absolute:

URI('file:///Users/mark').path
=> "/Users/mark"

Relative:

URI('file://tmp/foo').path
=> "/foo"
mark-dce commented 8 years ago

Sorry - I had patched my local copy with file:/// so my results are suspect. I'm going to wipe out my environment and start from scratch and will report back.

mark-dce commented 8 years ago

I erased my working space and restarted my machine. On a clean environment, I get these results:

With a relative path:

# config/application.rb
derivatives_dir: 'tmp/derivatives'

I get this error

Worker
MARKs-MB.local:1544 on DERIVATIVES at about a minute ago Retry or Remove
Class
 ActiveJob::QueueAdapters::ResqueAdapter::JobWrapper
Arguments
---
job_class: CreateDerivativesJob
job_id: 4bcc6a52-6bde-4bb1-8ff4-27aa4dcfcf3d
queue_name: derivatives
arguments:
- g732d897c
- tmp/uploads/g7/32/d8/97/cusb-cyl10952b.wav
locale: en
Exception
Errno::EACCES
Error
Permission denied @ dir_s_mkdir - /derivatives

With an absolute path:

# config/application.rb
derivatives_dir: '/Users/mark/Documents/workspace/_no_backup/RubymineProjects/alexandria-v2/tmp/derivatives'

No errors, but a bunch of .mp3 files get left in my /tmp directory that weren't there before running an ingest job; it looks like .ogg files are getting created there too, but they're getting cleaned up. I'm not sure if this is curation_concerns, derivatives, or alexandria behavior.

$ ls -la /tmp/sufia*
-rw-r--r--  1 mark  wheel  2162747 Apr 25 16:05 /tmp/sufia20160425-1881-3nfyg3.mp3
-rw-r--r--  1 mark  wheel  2108726 Apr 25 16:05 /tmp/sufia20160425-1887-o9gfa3.mp3
-rw-r--r--  1 mark  wheel  2140596 Apr 25 16:05 /tmp/sufia20160425-1891-c4jt0q.mp3
-rw-r--r--  1 mark  wheel  2128788 Apr 25 16:05 /tmp/sufia20160425-1893-1dpniw3.mp3
-rw-r--r--  1 mark  wheel  2230039 Apr 25 16:05 /tmp/sufia20160425-1896-1whuhgc.mp3
-rw-r--r--  1 mark  wheel  2149269 Apr 25 16:05 /tmp/sufia20160425-1898-1d327t2.mp3
... # etc. these all appeared while I was running the ingest job

Everything else seems to be working as expected if I run an ingest job with derivatives_dir: pointed at an absolute path.

I'm running an older version of curation_concerns, but I think the pertinent code is identical (aside from being merged into curation_concerns from -models)

$ bundle list curation_concerns
/Users/mark/.rvm/gems/ruby-2.3.0@alexandria/gems/curation_concerns-0.10.0
$ bundle list curation_concerns-models
/Users/mark/.rvm/gems/ruby-2.3.0@alexandria/gems/curation_concerns-models-0.10.0
jcoyne commented 8 years ago

@mark-dce can you close this ticket and write up a new ticket for "handle relative paths/do not allow relative paths for derivatives_dir"

jcoyne commented 7 years ago

@mark-dce do you still want to keep this ticket open for any reason?

mark-dce commented 7 years ago

@jcoyne @mjgiarlo - I don't think the code should work in unexpected and undocumented ways. I don't have a strong preference whether the solution is "handle relative paths for derivatives_dir" or "don't allow relative paths for derivatives_dir". If other configuration parameters allow relative paths, it seems like you would expect the code to handle them here too. I don't entirely feel like it's my call to make though, I do feel like the existing behavior produces difficult to diagnose errors.

mjgiarlo commented 7 years ago

@mark-dce What would you think about raising this on a Hydra Tech call to gather community input on this? Then, depending on what folks want, you can write up a new Hyrax ticket?