Closed adolski closed 11 months ago
I'm currently working on getting the Hathitrust check automated after import is complete (in order to get this working locally, I had to tinker with the code to bypass the need to be in production or demo mode). The code in the PR will be slightly different and less messy, but I wanted to document how I tested it:
The import_async(task)
method includes code to instantiate a new hathitrust object
, and then calls on that object to run the .check(task)
method (outside of testing purposes, this is going to be called on with .check_async(task)
):
app/models/record_source.rb
def import_async(task)
unless Rails.env.production? or Rails.env.demo?
raise 'This feature only works in production. '\
'Elsewhere, use a rake task instead.'
end
hathi_trust = Hathitrust.new
hathi_trust.check(task)
## rest of code
end
I required
the path to the Hathitrust model
inside the Task model
, and then included two new methods for testing the automated hathitrust check triggers --
import_task_completed?
which checks if the import status is failed
(which, during testing it always will be, due to aws token expiring
) and if the service is for local storage. In prod/demo the condition will be changed so it checks if the import status succeeded.trigger_hathitrust_check
). This is processed as a after_save
callback to ensure the object is saved before trigger logic. Again, for local testing purposes, the action called on inside this method is .check(task)
but otherwise it should be .check_async(task)
: app/models/task.rb
require_relative './hathitrust.rb'
class Task < ApplicationRecord
class Status
## original code
end
after_initialize :init
before_save :constrain_progress, :auto_complete
after_save :trigger_hathitrust_check, if: :import_task_completed?
def import_task_completed?
status == Status::FAILED and service == Service::LOCAL_STORAGE
## status is always FAILED when running import locally bc of aws tokens
end
def trigger_hathitrust_check
hathi_trust = Hathitrust.new
task = Task.create!(name: 'Preparing to check HathiTrust',
service: Service::HATHITRUST,
status: Status::SUBMITTED)
hathi_trust.check(task)
end
Inside the Hathtrust model, I commented out the class method Book.analyze_table
for testing locally. Including the method was giving me a local error: PG::ActiveSQLTransaction: ERROR: VACCUM cannot run inside a transaction block
app/models/hathitrust.rb
class Hathitrust
def self.check_authorized?
## original code
end
def check(task = nil)
## original code
rescue => e
Rails.logger.error("Hathitrust.check(): #{e}")
task.update!(name: "HathiTrust check failed: #{e}",
status: Task::Status::FAILED)
raise e
else
task.name = "Checking HathiTrust: updated database with "\
"#{Book.where(exists_in_hathitrust: true).count} found items."
task.status = Task::Status::SUCCEEDED
task.save!
puts task.name
ensure
FileUtils.rm(pathname, force: true) if pathname.present?
# Book.analyze_table
end
end
## original code
## private methods
end
After making these tweaks, I was able to run bin/rails books:import
, which scanned the records locally, and then triggered a HathiTrust check automatically:
I'm closing this issue for now, but we can open it again if we happen to find a workaround for automating Google check
@gaurijo When convenient, could you deploy to demo and verify that this works?
@adolski I'm running into some issues when deploying to demo. Here are what the logs are showing:
I've never seen that error before, and I don't know what it is... I did a redeploy on my end and it seems to have worked. I just launched an import task and will keep an eye on it.
Interesting, I'll see if I can figure out why I'm getting that error.
Was the develop branch code merged into the demo branch when you did a redeploy?
Yes, but I didn't push it (I just did now).
I also removed the linux/amd64
from the command in rails-container-scripts/docker-build.sh
since it's no longer needed, but I haven't pushed it back up yet.
So some weird behavior I'm noticing:
Every row in the tasks table displays a Task object from the database tasks table. You can see in TasksController.import()
where a Task is created that will track the progress of the import.
After the import (RecordSource.import()
) is done, but before it returns, it launches the two service checks. We would want those to get their own new rows in the table. But instead we are passing our existing Task to them. So the row in the table that previously appeared complete, changes to oscillate between the HT check progress and the IA check progress.
Since the same existing task is getting passed through those two, could I just create separate tasks for those services, and then pass those in so it's not looking at the same task object from before?
Something like:
if Rails.env.production? || Rails.env.demo?
hathi_task = Task.create!(stuff to create a hathitrust service task)
ia_task = Task.create!(stuff to create a IA service task)
hathitrust.check_async(hathi_task)
ia.check_async(ia_task)
end
Sure. That's what I'd do.
Nice. I sent up another PR from the same branch I was working on yesterday.
Should I try deploying to demo again after the latest changes?
Sure! Try modifying the rails-container-scripts/docker-build.sh
to build only for ARM64 and see if that helps.
I'm still running into the same issue, even after modifying rails-container-scripts/docker-build.sh
to build only for ARM:
I came across someone else that had this issue on stackoverflow, but i'm not familiar with the solution suggested (and based on this answer, i'm also confused why the deploy works for you but not for me, unless it's something specific to my set up?)
If you browse the demo-book-tracker images in ECR, you can see the different sizes. The one I pushed a few days ago was 511MB. But the ones you pushed were 3770MB. I don't know what would cause them to be so different. I pulled one of your images and tried to run it and it did start, so, there's that.
I think the Stack Overflow post you linked to is on the right track--increasing the ephemeral storage sounds like something to try. The way to do that would be to edit the Terraform scripts, similar to how you did before, but this time the file you'd be editing is this one: https://github.com/UIUCLibrary/aws-book-tracker-demo-service/blob/master/container_defs/book_tracker.json.tpl
I don't know the details as to what would be the best value to use. It might require experimentation.
Alternatively you could look into why your images are so big and ways to shrink them down.
I haven't had any luck shrinking my images, even after pruning through unused containers/images/volumes/build cache etc., so I'm going down the route of increasing the ephemeral storage in terraform.
I edited this snippet inside the json file you linked to:
"entryPoint": null,
"portMappings": [
{
"hostPort": 3000,
"protocol": "tcp",
"containerPort": 3000
}
],
"ephemeralStorage": {
"sizeInGiB": 60
},
I added it here based on snippets I saw in the AWS docs. However when I run terraform plan
, it's saying that no changes were made.
I'm wondering if maybe I didn't include the ephemeralStorage
param in the correct place, so it's not registering that change, but I'm not quite sure. I didn't make any other changes aside from this snippet.
I might be wrong about having to add it to the book_tracker.json.tpl file. Try adding it to the aws_ecs_task_definition block like:
ephemeral_storage {
size_in_gib = 60
}
Currently, each step in the import-and-check process is manual, invoked by a button click. We would like to modify this so that, once an import completes, the import step invokes all three of the checking steps, as ECS tasks in parallel. The end result is that there is only one button (the import button) left in the UI.