Open aleksicmilica-sbg opened 2 years ago
Thanks @aleksicmilica-sbg just letting you know that I see this issue and I'll update you hopefully within a week. We have an updated version to this procedure that I need to finalize. I might direct you elsewhere about it because this repo is meant to reproduce GTEx analysis so I'm not going to update on it anymore.
Thank you @gaow for your quick reply, I am looking forward to seeing the updated procedure! milica
Hi @gaow , are there any updates on this? Thanks! :)
Unfortunately it's still work in progress here -- we are retiring the HDF5 format and use a VCF format instead for summary stats; and we change the way we compute priors. The timeline is still 2 weeks from now as we are short-handed here on the data anlaysis.
Let me help you debug the HDF5 pipeline though: can you do sos dryrun
instead of sos run
so you can print the actual command or script being used; then you can try to run that script directly see what gives an error? It looks the error is somewhat "silent" because otherwise SoS would have reported that. Running the script directly should help with it.
any update about the error?
@carmacrea did it work when you run our minimal working example? We have been using the same workflow logic ourselves without an issue so the only part I can think of is the HDF5 i/o but it would be great if you could verify with the minimal working example we provided.
Our new procedure involves performing univariate fine-mapping, saving results in RDS format or VCF format then query the top signals from those credible sets rather than the top SNP per gene. We have been doing that for our own analysis although we are still working on a new procedure for generating the mixture model (with @yunqiyang0215 ) before releasing an update. If you have not done fine-mapping then it is perhaps still better off figuring out what's going on with the HDF5 i/o.
yes, it worked when I run the example, but I use a file generate with tensorqtl because fastqtl is no longer maintained so I don´t know if this is the problem. Also my file format is variant_id instead of gene_id (I only had variant_id).
@carmacrea do you think it is possible for you to modify our minimal working example to reproduce your error and share it here?
I am not sure because in my case the gene_id/phenotype_id is bowelcorrdct so I don´t know how to process
I try again and I had this error: ERROR: Failed to connect to : ssh: Could not resolve hostname : Bad value for ai_flags
ERROR: [default_1]: [f2663c61ac68ead7]: Failed to connect to : ssh: Could not resolve hostname : Bad value for ai_flags
[default]: 3 pending steps: default_2, default_3, default_4
I am not the root user. I got the following error report when I ran my own data, but it worked well when I ran the example data you shared.
fastqtl2mash-docker sos run workflows/fastqtl_to_mash.ipynb \
--cwd fastqtl_to_mash_output \
--data_list data/test/test.list \
--gene_list data/test/test.txt \
--cols 3 4 5 \
-j 8 \
-v 3
DEBUG: R library rhdf5 (2.30.1) is available
INFO: Running default_1: Convert summary stats gzip format to HDF5
DEBUG: _input: data/test/test_1.tsv.gz
DEBUG: Signature mismatch: Missing target /gtexresults/fastqtl_to_mash_output/test_1.tsv.h5
DEBUG: _input: data/test/test_2.tsv.gz
DEBUG: Signature mismatch: Missing target /gtexresults/fastqtl_to_mash_output/test_2.tsv.h5
INFO: default_1 (index=1) is completed.
DEBUG: Failed to create signature: output target /gtexresults/fastqtl_to_mash_output/test_2.tsv.h5 does not exist
DEBUG: Failed to write signature 662dbe61a59ea03a
INFO: default_1 (index=0) is completed.
DEBUG: Failed to create signature: output target /gtexresults/fastqtl_to_mash_output/test_1.tsv.h5 does not exist
DEBUG: Failed to write signature 3c0121cfe29ddedc
INFO: output: /gtexresults/fastqtl_to_mash_output/test_1.tsv.h5 /gtexresults/fastqtl_to_mash_output/test_2.tsv.h5 in 2 groups
File "/opt/conda/lib/python3.7/site-packages/sos/step_executor.py", line 1999, in run
yreq = runner.send(yres)
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/sos/step_executor.py", line 1999, in run
yreq = runner.send(yres)
File "/opt/conda/lib/python3.7/site-packages/sos/step_executor.py", line 1878, in run
self.verify_output()
File "/opt/conda/lib/python3.7/site-packages/sos/step_executor.py", line 450, in verify_output
f'Output target {target} does not exist after the completion of step {env.sos_dict["step_name"]}'
RuntimeError: Output target /gtexresults/fastqtl_to_mash_output/test_1.tsv.h5 does not exist after the completion of step default_1
DEBUG: Step default_1 failed
File "/opt/conda/lib/python3.7/site-packages/sos/__main__.py", line 552, in cmd_run
executor.run(args.__targets__, mode=config['run_mode'])
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/sos/__main__.py", line 552, in cmd_run
executor.run(args.__targets__, mode=config['run_mode'])
File "/opt/conda/lib/python3.7/site-packages/sos/workflow_executor.py", line 341, in run
return self.run_as_master(targets=targets, mode=mode)
File "/opt/conda/lib/python3.7/site-packages/sos/workflow_executor.py", line 1561, in run_as_master
raise exec_error
sos.executor_utils.ExecuteError: [default_1]: [default_1]: Output target /gtexresults/fastqtl_to_mash_output/test_1.tsv.h5 does not exist after the completion of step default_1
[default]: 3 pending steps: default_2, default_3, default_4
ERROR: [default_1]: [default_1]: Output target /gtexresults/fastqtl_to_mash_output/test_1.tsv.h5 does not exist after the completion of step default_1
[default]: 3 pending steps: default_2, default_3, default_4
fastqtl2mash-docker sos run workflows/fastqtl_to_mash.ipynb \
--data_list data/fastqtl/FastQTLSumStats.list \
--gene_list data/fastqtl/GTEx_genes.txt \
-j 8
INFO: Running default_1: Convert summary stats gzip format to HDF5
INFO: default_1 (index=0) is completed.
INFO: default_1 (index=1) is completed.
INFO: output: /gtexresults/fastqtl_to_mash_output/Tissue_2.fastqtl.h5 /gtexresults/fastqtl_to_mash_output/Tissue_1.fastqtl.h5 in 2 groups
INFO: Running default_2: Merge single study data to multivariate data
INFO: default_2 is completed.
INFO: output: /gtexresults/fastqtl_to_mash_output/FastQTLSumStats.h5
INFO: Running default_3: Extract data to fit MASH model
INFO: default_3 is completed.
INFO: output: /gtexresults/fastqtl_to_mash_output/FastQTLSumStats.portable.h5
INFO: Running default_4: Subset and split data, generate Z-score and save to RDS
INFO: default_4 is completed.
INFO: output: /gtexresults/fastqtl_to_mash_output/FastQTLSumStats.mash.rds
INFO: Workflow default (ID=e27dcc0f542cb7f3) is executed successfully with 4 completed steps and 5 completed substeps.
Yes, it has worked for my example. Now what I don't know is how to carry out the MASHR analysis with the given data, as I use this command but it doesn't work: fastqtl2mash-singularity sos run mashr_flashr_workflow.ipynb mash --data ../data/FastQTLSumStats.mash.rds and it gives me the following error: INFO: Running vhat_mle: V estimate: "mle" method INFO: Running pca: INFO: Running flash_nonneg: Perform FLASH analysis with non-negative factor constraint (time estimate: 20min) INFO: Running vhat_simple: V estimate: "simple" method (using null z-scores) INFO: Running flash: Perform FLASH analysis with non-negative factor constraint (time estimate: 20min) ERROR: flash_nonneg (id=831e2a240d34d36d) returns an error. ERROR: pca (id=cd2135596b55fb49) returns an error. ERROR: vhat_simple (id=a6a6ac2bd80c0140) returns an error. ERROR: flash (id=16103308e4081ad1) returns an error.
Hi,
I am trying to run
fastqtl_to_mash.ipynb
script to convert EMBL eQTL catalogue data (BLUEPRINT dataset) to MASHR format. I am receiving the following error:INFO: Running default_1: Convert summary stats gzip format to HDF5 INFO: default_1 (index=0) is completed. INFO: default_1 (index=1) is completed. INFO: default_1 (index=2) is completed. INFO: output: /opt/gtexresults/fastqtl_to_mash_output/BLUEPRINT.neutrophil.test.tsv.h5 /opt/gtexresults/fastqtl_to_mash_output/BLUEPRINT.tcell.test.tsv.h5... (3 items in 3 groups) ERROR: [default_1]: [default_1]: Output target /opt/gtexresults/fastqtl_to_mash_output/BLUEPRINT.neutrophil.test.tsv.h5 does not exist after the completion of step default_1 [default]: 3 pending steps: default_2, default_3, default_4
Here are the execution details:
- Data is in gzip compressed tab separated txt file with header containing the following columns:
gene_name
,snp_id
,beta
,se
,pval
). Each tissue file contains 10k SNPs, since at the moment I am testing the workflow.- BLUEPRINT.tissues.list contains a list of relative paths to individual tissue files
- I am runnining inside a docker container pulled from here
- I am able to run the example command line with dummy data from the instructions
- My command line is:
sos run workflows/fastqtl_to_mash.ipynb --data_list blpt-test/BLUEPRINT.tissues.list --cols 4 5 3 --gene-list blpt-test/genes.sorted.uniq.txt -j 1
Could you please help me figure out this error?
Thanks in advance!
milica
P.S. The entire documentation is phenomenal. Especially enjoyed reading these pages, it's so detailed and precise. Thank you!
I was able to resolve the error by ensuring that the columns in my data were the same as the example data, with the same order. While the 'fastqtl_to_mash.ipynb' file provides a description of the requirements for 5 columns, it seems that ensuring the columns match the example data is also necessary for the code to run successfully.
Please restructure your input data of 'fastqtl_to_mash.ipynb' following the 9 columns as the following:
gene_id
variant_id
tss_distance
ma_samples
ma_count
maf
pval_nominal
slope
slope_se
Hi,
I am trying to run
fastqtl_to_mash.ipynb
script to convert EMBL eQTL catalogue data (BLUEPRINT dataset) to MASHR format. I am receiving the following error:Here are the execution details:
gene_name
,snp_id
,beta
,se
,pval
). Each tissue file contains 10k SNPs, since at the moment I am testing the workflow.Could you please help me figure out this error?
Thanks in advance!
milica
P.S. The entire documentation is phenomenal. Especially enjoyed reading these pages, it's so detailed and precise. Thank you!