Merging of Single variant color switching + Data generation pipeline

amkwong commented 5 years ago

I want to merge my additions from the data_generation_pipeline branch #8 into Mukai's reorganized branch. My goal is to have one unified master branch from which we can work.

I am currently working on resolving the merge conflicts between the different versions.

amkwong commented 5 years ago

Looking at the listed conflict, it looks like the only file it's complaining about is index.html because I added an extra line. I just merged it in, so there should be no more conflicts.

amkwong commented 5 years ago

I've fixed some problems and done additional testing. I have successfully managed to run the data generation pipeline on both the StatGen cluster and my own Linux subsystem.

We should review this pull request, remove extraneous things, and merge it into master so we can work off the reorganized structure.

Quick setup instructions:

The only file that needs to be copied or linked is Homo_sapiens.GRCh38.97.chr.gff3.gz (downloaded from ensembl, or you can copy or link it from my directory at /net/amd/amkwong/browseQTL/all_chr/ensembl/). This file needs to be accessible in the data directory. (The make step will fail if this file is missing.)
After that, enter the util directory and run generate.makefile.to.process.data.py to generate run.extract.Makefile. You can build everything automatically by running this makefile.
Once make finishes, you can copy the test file (inside the data/test directory) to the data directory. You can run the test server using phegetrun and test features in the single-variant view.
The test files contain a single variant (at chr19:6718376) currently.
More extensive setup instructions can be found in README.md in the util directory.

Notes:

It was suggested to also include chr15:78570111, but I don't want to add that to the repo at the moment to avoid making it too large.
If you want to test that variant, and you have access to the UM StatGen cluster, you can always use tabix to extract this variant (from /net/amd/amkwong/browseQTL/all_chr/by_chromosome/) and store the results in a file called chr15.All_Tissues.sorted.txt.gz, then copy it to the data directory. This will allow you to access this particular variant for testing.

abought commented 5 years ago

I've pushed some minor cleanup to this branch (mostly to reduce flake8/ eslint noise, and removing files that don't need to be in the repo).

With Alan's changes I'm seeing some errors trying to run the app: No such file or directory: 'data/gene.symbol.pickle'. Pausing review until we can sort that out, but look forward to merging soon!

amkwong commented 5 years ago

I generated and tested small files which should allow us to run the flask app directly from a git clone (after switching to this branch).

Currently pheget should run for the variant 19:6718376 from a fresh install with no additional installation or data processing.

abought commented 5 years ago

Looks good; merging!

Because this PR represented some initial trial and error, we have a lot of "noisy" commits in this branch (including some large files that were removed early on)

Therefore, this one time, I've squashed the entire PR into a single commit to clean all that up. I apologize for the bad etiquette of compressing commit history, but to compensate, I've created a CONTRIBUTORS.md to ensure you all got credit for your work.

There will be more commits (and lots more credit) in the future; thanks all!

amkwong commented 5 years ago

Thanks for the review and merge! We can all work from the master branch again finally.

We need to get this updated on the staging server (I'll ask Peter).

statgen / fivex

Merging of Single variant color switching + Data generation pipeline #10