psu-libraries / scholarsphere-3

A web application for ingest, curation, search, and display of digital assets. Powered by Hydra technologies (Rails, Hydra-head, Blacklight, Solr, Fedora Commons, etc.)
Apache License 2.0
78 stars 24 forks source link

For characterization use the locally uploaded file or create a temp file #451

Closed carolyncole closed 7 years ago

carolyncole commented 7 years ago

This allows content to be streamed for characterization and text extraction instead of accessing the content in memory.

This has already been done in ScholarSphere 2.0 here: https://github.com/psu-stewardship/scholarsphere/commit/95b3cfe9c858750e73dadc6efd0d98ecdc9ec0ea

We should verify that large files do suck memory before implementing any changes. It is possible this memory "leak" was plugged already in Sufia 7.

carolyncole commented 7 years ago

Looks like this is an issue in ScholarSphere 3.0

---
job_class: ImportVersionJob
job_id: 2da404db-e5f1-46c0-893b-39ba07d2c779
queue_name: files
arguments:
- _aj_globalid: gid://scholar-sphere/FileSet/vmc87pr745
- "/opt/heracles/deploy/scholarsphere/releases/20170217172726/tmp/uploads/vmc87pr745_version1_somefile15g.txt"
locale: en
Exception
NoMemoryError
Error
failed to allocate memory