SENG8080
Contributors
- Vishalbhai Barvaliya
- Sumukh Dhawade
- Sindhu Parkavi Sivakumar
- Kevin Jacob Mathew
- Lakshay Soni
- Jose Jaramillo
- Jeevan Preet Saini
- Mansi Trivedi
- Amandeep Amandeep
- Yusra Khan:)
- Garima Sharma
- Mukesh Yarlagadda
- Nisha Kathiriya
- Arman Fidaulla Sharief
- Kriti Thamman
- Hareesh Varatharaj
- Prathmesh Jani
- Sagar Vaiyata
- Darshan Lunagariya
- Karan Sorathiya
- Prashant Thakkar
- Roopesh Votarikari
- Chandan Sulimalthe Sannappa
- Pradeep Gahlawat
- Mrunal Wavdhane
- Sree Kodavanti
- Zarana Gohil
- Karan Patel
- Harshal Parmar
data collection using Python script
data_collection_from_s3.py - This script can be used to retrieve the data and unzip the files recursively through all the folders.
data Collection using PowerShell script
Prerequisite
- Install 7zip
- Install AWS CLI
To Run the file
- Use power shell
- Navigate to your file folder
- use command .\Data_Collection_From_AWS_S3.ps1 to run the file
Script Explanation
- It Creates a folder 'genome_browser' if it doesn't exist
- Collects all the objects under the aws s3 folder and store them in the object field
- Iterate through the objects one by one and download them from aws s3 to the local folder
- Get all the .gz files from the local folder
- Iterate through all the .gz files and unzip them in the destination folder using 7zip commands
Data Quality Team
Steps performed
- Collected the data from AWS S3.
- Changed the genome file format to .csv
- Genome.fa file have the genome data while the ncbiGenes.gtf file has the GFF (Genome Feature Format).
- Transform.py has the script written in Python to generate the Genome string column.
- This column has the genome string from start and end for each row.
- Created the GenomeCleanData.csv file and locate it inside the Clean_Data folder.
Data Storage and Maintainance Team
- Python script created and provided in "Data Storage and Maintainance" directory
- Also, Added the ReadMe file in the same folder to guide the Devops team on performing the whole task.
- Created PR by creating new branch from Main and asked for a review to Professor as there is no other members in my team.
- If you need anything from me just create an issue by tagging me.