openworm / movement_cloud

Movement Analysis on the cloud
http://movement.openworm.org/
Other
5 stars 4 forks source link

Restoration of the database. #137

Open cheelee opened 1 year ago

cheelee commented 1 year ago

Summary of my thoughts on how we can get this going can be found here:

https://docs.google.com/document/d/14R72-ev7Toh6iCN2de0FCOm5OrOWY33w5qV8LGLUgZM/edit?usp=sharing

Locate Database Code - https://github.com/openworm/movement_cloud. Locate or acquire a copy of MySQL database archive of worm movement metadata - probably via Dr. Andre Brown, or some forgotten work archive in Lee Chee Wai’s possession. This database or a subset should be probably curated somewhere moving forward. Framework is Django, and was previously deployed on AWS. Deployment steps should be laid out again, and documented. E.g. any self-signed certificates etc … This information is probably best facilitated by Michael Currie. The framework’s access to Zenodo (for the actual movement data) should be reviewed and tested. Combination of Chee Wai, Michael, and Andre.

> Advanced:

Modernize the Django and front-end code. Previous backend database access was ad-hoc using basic Django operations, and could be modernized to use a cleaner REST framework implementation. Nicer interface alternatives to Data Tables, Crossfilter, and other major front-end features may be available in more modern interfaces implemented in React, and could be investigated. Deployment. Prior deployment dispensed with any need to move from a debug/development footing for Django because the data is open, public, and the interface was read-only. Maybe it would be nice to modernize the deployment by putting the Django implementation in production mode, while using gunicorn as its WSGI layer, behind an NGinx web service. This allows us to start the web service as a daemon, with logging and other facilities cleanly built into the system. Previously each time we restarted the service, we had to invoke python manage.py runserver in background, no hangup mode, with standard out and standard error directed to a log file we specify. User-Standalone Deployment. We could make the framework installable as a local service to users, so a centralized server would not be necessary and save the Foundation some subscription money. Instead of the full database, we could provide a smaller but meaningful subset. We will need to ensure that the Zenodo data can be accessed by members of the public via this setup.

MichaelCurrie commented 1 year ago

So I think what happened is that last week I erased all my old servers running on my old BruceForceResearch AWS account. And one of them I think was the OpenWorm movement validation server... But I'm sure the data exists elsewhere. And the software to run it is of course in this repo.

cheelee commented 1 year ago

Cool. This weekend I'll spend some time digging out the contents of my OpenWorm USB stick. I'll probably have a version of the database somewhere.

I could perhaps look into Dockerizing the package, so all the operations and testing can be done on my local machines and then it "just works" when moved to AWS.

MichaelCurrie commented 8 months ago

@Eviatar remarked:

Hi all, I just visited: http://movement.openworm.org/ The certificate had a security issue that led to a strange site based in China. Do you know what happened? Best, Ev

The DNS entry for movement.openworm.org was pointed to an ipv4 IP allocated by AWS elastic IPs and pointed to an EC2 instance running in a AWS datacenter owned by the OpenWorm AWS account. However, I shut down this EC2 instance thinking it wasn't being used, and subsequently, this account (root email: XXXXX (contact me privately if you need it)) no longer exists (?) or is inaccessible to me. (Sorry!)

So what needs to be done is to spin up a new Ubuntu machine, clone this repo, and start the Django process. This will serve the HTML and other files constituting the Movement Database, including the home page:

The content of the website is here: https://github.com/openworm/movement_cloud/tree/master/webworm/templates/webworm

For crossfilter, all 12,234 experiment summary metadata are saved in this CSV file on the repo:

https://github.com/openworm/movement_cloud/blob/master/webworm/static/webworm/worm_mock_data.csv

timestamp,hour,iso_date,pretty_date,pretty_time,day_of_week,worm_length,path_range,strain,allele
201001111135,11.5833333333,2010-01-11,"January 11th, 2010",11:01,1,1151.86,1014.6,N2,-N/A-
201001111136,11.6000000000,2010-01-11,"January 11th, 2010",11:01,1,1219.76,278.922,N2,-N/A-
201001111136,11.6000000000,2010-01-11,"January 11th, 2010",11:01,1,1162.47,807.693,N2,-N/A-
201001111136,11.6000000000,2010-01-11,"January 11th, 2010",11:01,1,1222.2,1613.83,N2,-N/A-
201001111137,11.6166666667,2010-01-11,"January 11th, 2010",11:01,1,1179.41,1169.63,N2,-N/A-
201001111138,11.6333333333,2010-01-11,"January 11th, 2010",11:01,1,1156.19,1355.74,N2,-N/A-
201001111155,11.9166666667,2010-01-11,"January 11th, 2010",11:01,1,1114.85,489.98,RB1990,ok2625
201001111156,11.9333333333,2010-01-11,"January 11th, 2010",11:01,1,1137.73,2785.25,RB1990,ok2625
201001111156,11.9333333333,2010-01-11,"January 11th, 2010",11:01,1,1103.93,3053.32,RB1990,ok2625

The links to the full HDF5 files and WCON files are saved in a file called movement_data_download_package.zip which is downloaded by the client-side crossfilter_genData.js script. This file is not in this repo, however, and I have to remember where it's saved. Hopefully the only copy wasn't on that now-gone EC2 instance. Luckily I found it on my personal Dropbox. Here is a link: movement_data_download_package.zip

This file contains many many URLs to the hdf5 files, e.g.:

True to their word, Zenodo is still hosting all of these URLs.

1033989 https://zenodo.org/record/1033989/files/343 ED302 on food L_2010_11_26__16_13_34__12.hdf5
1033989 https://zenodo.org/record/1033989/files/343 ED302 on food L_2010_11_26__16_13_34__12_features.hdf5
1033989 https://zenodo.org/record/1033989/files/343 ED302 on food L_2010_11_26__16_13_34__12.wcon.zip
1033987 https://zenodo.org/record/1033987/files/247 JU438 swimming_2011_03_08__12_39_28___2___4_features.hdf5
1033987 https://zenodo.org/record/1033987/files/247 JU438 swimming_2011_03_08__12_39_28___2___4.wcon.zip
1033987 https://zenodo.org/record/1033987/files/247 JU438 swimming_2011_03_08__12_39_28___2___4.hdf5
1033985 https://zenodo.org/record/1033985/files/ocr-3 (a1537) off food L_2010_04_27__12_11_38__5.wcon.zip
1033985 https://zenodo.org/record/1033985/files/ocr-3 (a1537) off food L_2010_04_27__12_11_38__5.hdf5
1033985 https://zenodo.org/record/1033985/files/ocr-3 (a1537) off food L_2010_04_27__12_11_38__5_features.hdf5
1033983 https://zenodo.org/record/1033983/files/ocr-3 (ok1557) off food _2010_04_28__11_01_12___2___3.hdf5

This Django models.py gives the structure of the database, which @cheelee worked on. I'm not sure what actually populates this, but I think maybe it's just using that CSV file from above: https://github.com/openworm/movement_cloud/blob/mster/webworm/models.py

@cheelee do you remember how the Zenodo listed rows are linked back to the CSV file above? I don't see an obvious foreign key.

So to summarize I think the repo is actually self-contained other than the movement_data_download_package.zip which I have now found a copy of. So the next step is to make an ubuntu instance and then repoint the DNS to it.

I could perhaps do this in the coming weeks in my spare time using my company's AWS account...