langcog / childes-db

A SQL interface for the CHILDES child language corpora
12 stars 5 forks source link

CHILDES regeneration and migration #14

Closed smeylan closed 6 years ago

smeylan commented 7 years ago

Regeneration / Migration Procedure (Apache version)

For S3 rather than Apache: Rather than putting the zipped SQL dumps in the Apache directory, upload them to S3 with Boto. This can be done from Chompsky (incurring a higher transfer cost) or from the staging instance (requiring the staging instance to have AWS credentials). In any cases, we need to be able to pass the S3 bucket addresses back to the website (or retrieve them with Boto when generating the static content of the site).

[ ] How do we provision a persistent EBS volume that MySQL can use that can be attached to new instances [ ] Should we use Route 53 to give a more reasonable name to the mysql connection, and use it for the Apache web server (under the Apache version)

smeylan commented 7 years ago

I've done the first part of this in the sm/populate-db branch in update_childesdb.py, as of 25 August, but I am getting an error with the migration script,

Traceback (most recent call last):
  File "/home/stephan/notebooks/childes-db/djangoapp/db/childes_db.py", line 59, in migrate
    result.get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
MaybeEncodingError: Error sending result: 'OperationalError(2006, 'MySQL server has gone away')'. Reason: 'PicklingError("Can't pickle <type 'traceback'>: attribute lookup __builtin__.traceback failed",)'
smeylan commented 7 years ago

And here's a quickstart for related Boto3 functions

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

in ~/.aws/config:
[default]
region=us-east-1

Python examples:

### S3
import boto3
s3 = boto3.resource('s3')
s3.create_bucket(Bucket='my-bucket', CreateBucketConfiguration={
    'LocationConstraint': 'us-west-1'})
data = open('test.jpg', 'rb')
s3.Bucket('my-bucket').put_object(Key='test.jpg', Body=data)
# may need to specify a CORS policy

### EC2

# powering on an instance
import boto3
ec2 = boto3.client('ec2')
try:
        ec2.start_instances(InstanceIds=[instance_id], DryRun=True)
    except ClientError as e:
        if 'DryRunOperation' not in str(e):
            raise

    # Dry run succeeded, run start_instances without dryrun
    try:
        response = ec2.start_instances(InstanceIds=[instance_id], DryRun=False)
        print(response)
    except ClientError as e:
        print(e)

# creating an elastic IP
import boto3
ec2 = boto3.client('ec2')
try:
    allocation = ec2.allocate_address(Domain='vpc')
    response = ec2.associate_address(AllocationId=allocation['AllocationId'],
                                     InstanceId='INSTANCE_ID')
    print(response)
except ClientError as e:
    print(e)

# assigning an existing elastic IP

addresses_dict = client.describe_addresses()
for eip_dict in addresses_dict['Addresses']:
    print eip_dict['PublicIp']

Then use .associate_address, but the arguments depend whether we are using a Virtual Private Cloud (VPC) or "EC2 classic"

# starting a new instance
https://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.Subnet.create_instances
ec2.create_instances(ImageId='<ami-image-id>', MinCount=1, MaxCount=1)
# should add keys to identify this machine using the tagSpecification key

# kill an old instance
#https://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.Client.terminate_instances
#You can stop, start, and terminate EBS-backed instances. You can only terminate instance store-backed instances. When you terminate an instance, any attached EBS volumes with the DeleteOnTermination block device mapping parameter set to true are automatically deleted.
amsan7 commented 7 years ago

i created a google doc for this feature (as I believe it requires some more discussion)