It adds (or makes available) new options to the user:
number of participants to simulate
extra optional hapgen2 options (mutation rate, effective population size)
It adds the ability to download ready-made leg files from S3
It adds the ability to simulate GWAS summary statistics using GTCA
Details of changes:
Added --num_participants as a param (which is then used in the simulate_gen_and_sample (haggen2) process so that user can decide how many participants to simulate (this was previously hardcoded)
Re-made the *leg files so that they now have only bi-allelic SNPs:
The *leg files were then zipped and uploaded to S3
This removal of bi-allelic SNPs is a one-off step so the pipeline does not automate this. However, all instructions to make the files are in /assets/legend_for_hapgen2/README.md. The script make-leg-files.sh is also there.
Added a download_leg_files process to download the files from S3 and unzip them
Made a new channel downloaded_leg_files_ch to handle these changes
Added a process called simulate_gwas_sum_stats that takes in PLINK files and simulate GWAS summary statistics. The user must specify --gwas_cases and --gwas_controls.
Updated documentation: README.md and docs/usage.md
Comments
Despite being uploaded to S3, I have left the *leg files, the *leg.gz (along with the README.md, the VCF and the make-leg.sh script) in assets/legend_for-hapgen2.
The extra flags for GCTA (but not the mandatory ones) are not working properly. It would seem that GTCA does not handle newlines in the same way as hapgen2 (where options can all be added together) or METAL (where padding + a \n solved the issue). I have added a card to address this: https://github.com/lifebit-ai/simulate/issues/12
This PR does the following:
It adds (or makes available) new options to the user:
It adds the ability to download ready-made leg files from S3
It adds the ability to simulate GWAS summary statistics using
GTCA
Details of changes:
Added
--num_participants
as a param (which is then used in thesimulate_gen_and_sample
(haggen2
) process so that user can decide how many participants to simulate (this was previously hardcoded)Re-made the
*leg
files so that they now have only bi-allelic SNPs:/assets/legend_for_hapgen2/README.md
. The scriptmake-leg-files.sh
is also there.download_leg_files
process to download the files from S3 and unzip themdownloaded_leg_files_ch
to handle these changesAdded a process called
simulate_gwas_sum_stats
that takes inPLINK
files and simulate GWAS summary statistics. The user must specify--gwas_cases
and--gwas_controls
.Updated documentation:
README.md
anddocs/usage.md
Comments
Despite being uploaded to S3, I have left the
*leg
files, the*leg.gz
(along with theREADME.md
, the VCF and themake-leg.sh
script) inassets/legend_for-hapgen2
.The extra flags for
GCTA
(but not the mandatory ones) are not working properly. It would seem thatGTCA
does not handle newlines in the same way ashapgen2
(where options can all be added together) orMETAL
(where padding + a\n
solved the issue). I have added a card to address this: https://github.com/lifebit-ai/simulate/issues/12Test on CloudOS:
Current test failing (error 141).