Closed mkoromina closed 2 years ago
Hi @mkoromina, most like your phenotype file does not have a column with a header called IID. Please look at what the header line should look like in the example files (as described in the wiki).
Hope this helps, please let me know if not!
Hi @omerwe ,
Thanks for the quick reply! I did check the headers and a column with the 'IID' can be found. So the columns in my pheno file (edited the .fam file to actually include headers) include FID, IID, a few other ones relevant to .fam files and 'PHENO' column.
Do you have any other potential solutions/suggestions on what could be wrong? Many thanks in advance!
@mkoromina can you post the header of your phenotypes file (or send it to me at oweissbrod@hsph.harvard.edu)? If you can share, you can run head <pheno_file>
and post/send the output.
Another possibility is to physically copy-paste the header line of the example file provided with PolyFun into your own phenotypes file. Maybe you have some extra (possibly hidden) characters in your header line that mess things up?
Hi @omerwe,
Many thanks for your recommendation! I will try your suggestions and come back to you, if the issue still persists (it must be a hidden character in the header line). Thanks once again all the useful tips!
Hi @omerwe,
Really sorry to re-open this. I fixed the header of the pheno file and upon trying to re-run the above mentioned script, I get the following error message:
Traceback (most recent call last):
File "/path/to/polyfun/polypred.py", line 434, in <module>
estimate_mixing_weights(args)
File "/path/to/polyfun/polypred.py", line 295, in estimate_mixing_weights
df_prs_sum = computs_prs_all_files(args, betas_file, disable_jackknife=True, keep_file=args.pheno)
File "/path/to/polyfun/polypred.py", line 239, in computs_prs_all_files
keep_file=keep_file
File "/path/to/polyfun/polypred.py", line 83, in compute_prs_for_file
raise ValueError('No betas found for SNPs in plink file %s'%(plink_file_prefix))
ValueError: No betas found for SNPs in plink file /path/to/cohort1.bed
Do you know what could be wrong in this instance? Many thanks!
p.s= Just to restate some criteria to what I am using: (a) effect sizes from another method (not BOLT-LMM), (b) effect sizes from Polyfun. I am using (c) a bed file and a pheno file from a small subset of the testing cohort which is not included in (a). May I note that (a) is from another method with includes all individuals but the ones which comprise the small subset in (c). Polyfun was run for all assessed individuals (b).
Hi @mkoromina, The code can't find any sumstats for the SNPs in your bim file. Are you sure they have the same chromosome and allele encodings?
If you want, please post a few lines from your .bim
file and from your sumstats file that you think should match the same SNPs, and we'll try to figure out why the code thinks they're different SNPs. (please note that the code doesn't use rsids to identify SNPs because they're not unique; it uses SNP positions and alleles)
Hi @omerwe,
Sure, I am attaching below some lines corresponding to certain SNPs from my sumstats file (from the 'other method') and the respective info for these from the .bim file.
-sumstats file
-bim file
If there is any extra information that is needed for trouble-shooting, just let me know. Many thanks!
@mkoromina unfortunately I can't easily figure out what's the source of the problem. If you want, you can send a small reproducible example to oweissbrod@hsph.harvard.edu and I'll try to figure it out...
Hi @omerwe ,
I think there may be something off with the .bim
, i.e., data not being properly parsed. I can definitely though try and create a small example and send It to you!
Many thanks!!
Hi @omerwe,
Thanks for providing us with such a useful tool! Coming back to step 3 of the PolyPred pipeline: I am testing my data by using (a) effect sizes from another method (not BOLT-LMM), (b) effect sizes from Polyfun. I am using (c) a bed file and a pheno file from a small subset of the testing cohort which is not included in (a). May I note that (a) is from another method with includes all individuals but the ones which comprise the small subset in (c). Polyfun was run for all assessed individuals (b).
The code I am running is:
python /path/to/polypred.py --combine-betas --betas /path/to/other_method.tsv.gz,/path/to/polyfun.txt.gz --pheno /path/to/mypheno.fam --output-prefix /path/to/results/combine_effects --plink-exe /path/to/plink /path/to/my_subset.bed
The full log message and error that I receive is:
Traceback (most recent call last): File "/path/to/polypred.py", line 434, in
estimate_mixing_weights(args)
File "/path/to/polypred.py", line 280, in estimate_mixing_weights
df_pheno = pd.read_csv(args.pheno, names=['FID', 'IID', 'PHENO'], index_col='IID', delim_whitespace=True)
File "/path/to/polyfun_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "/path/to/polyfun_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 460, in _read
data = parser.read(nrows)
File "/path/to/polyfun_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 1198, in read
ret = self._engine.read(nrows)
File "/path/to/polyfun_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 2198, in read
values = data.pop(self.index_col[i])
KeyError: 'IID'
Any indication on how I shall fix this or if it could an issue with my input data is more than welcome! Many thanks once again, Maria