statgen / locuszoom-standalone

Create regional association plots from GWAS or meta-analysis
http://locuszoom.org/
58 stars 19 forks source link

Error: object 'refFlat' not found #16

Closed frahimov closed 3 years ago

frahimov commented 3 years ago

Hello, I am trying to make some LocusZoom plots, using user supplied LD data. The summary statistics data is in --metal format and the program seems to recognize the LD file as well. After a long delay I started seeing around 66 lines that tells SNP is not found in the database, which I guess is expected.

Warning: could not find position for SNP NA in user-supplied --ld file, skipping. ... ...

At the end however, I get this error message and a white blank pdf file. I am hoping for some assistance. Thank you

Grabbing annotations from SQLite database.. Creating plot.. Read 4 items Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 4 elements Calls: GetData -> read.file -> read.table -> scan recover called non-interactively; frames dumped, use debugger() to view Error: object 'refFlat' not found recover called non-interactively; frames dumped, use debugger() to view [1] "recrateRange: " [1] 100 0 Error in zplot(metal, ld, recrate, refidx, nrugs = nrugs, args = args, : object 'bed_tracks' not found recover called non-interactively; frames dumped, use debugger() to view Error in make.gene.list(refFlat, unit = args[["unit"]]) : object 'refFlat' not found recover called non-interactively; frames dumped, use debugger() to view Error: object 'geneList' not found recover called non-interactively; frames dumped, use debugger() to view Warning message: In sink() : no sink to remove Error in save(metal, refFlat, ld, recrate, refSnpPos, barplot_data, fmregions, : objects 'refFlat', 'ld', 'barplot_data', 'fmregions', 'gwas_hits', 'bed_tracks' not found recover called non-interactively; frames dumped, use debugger() to view [1] "Tue Feb 9 22:49:21 2021" Deleting temporary files.. Time required: 0d:0h:23m:54s

Here is the command that I used

locuszoom --build hg19 --gene-table gencode --metal metal_file.txt --ld LDfile.txt --refsnp rsSNP --snpset NULL

welchr commented 3 years ago

The first error is probably the real issue:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 4 elements
Calls: GetData -> read.file -> read.table -> scan
recover called non-interactively; frames dumped, use debugger() to view

Unfortunately it's not all that specific about which GetData() call is the problem. It does say line 1 did not have 4 elements, and the LD file is supposed to have 4 columns, so perhaps that is where the problem is. I would recommend checking that file - make sure all columns have valid data, and that each row is whitespace delimited (not sure why "any whitespace" ended up as the delimiter for this file, but that's what it expects.)

https://genome.sph.umich.edu/wiki/LocusZoom_Standalone#User-supplied_LD

frahimov commented 3 years ago

Thank you for your response. D' values are not known, so I have an empty column with column header name dprime. According to the link you shared "The dprime column can be all missing if it is not known.". Should there be NA or some other values in the column then?

welchr commented 3 years ago

One question - is --refsnp rsSNP in your command, or is rsSNP replacing the real SNP ID for confidentiality?

The error message Warning: could not find position for SNP NA in user-supplied --ld file, skipping makes me wonder if some or all of the SNP IDs are invalid (the SNP ID should not be NA), or perhaps none of them could be matched with your --refsnp ID.

frahimov commented 3 years ago

rsSNP is replacing the real SNP ID (with an rs number) for confidentiality. Sorry about that. Some SNPs in my summary statistics file are coded likes this rs###,rs### or NA. I will remove these and run again. snp2 column contains only --refsnp ID. The LD file looks pretty clean, only rs numbers in snp1 and snp2 columns, empty dprimer column and valid r2 values in the 4th column.

frahimov commented 3 years ago

Hi again. Just to clarify I am using version 1.4.

I cleaned both the summary statistics file and the LD file and rerun the script. Still getting similar error message.

............................................................................................................................................................ Grabbing annotations from SQLite database.. Warning: recombination rate table 'recomb_rate' not found in database, skipping recomb rate lookups in region Creating plot.. Read 4 items Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 4 elements Calls: GetData -> read.file -> read.table -> scan recover called non-interactively; frames dumped, use debugger() to view Error: object 'refFlat' not found recover called non-interactively; frames dumped, use debugger() to view [1] "recrateRange: " [1] 100 0 Error in zplot(metal, ld, recrate, refidx, nrugs = nrugs, args = args, : object 'bed_tracks' not found recover called non-interactively; frames dumped, use debugger() to view Error in make.gene.list(refFlat, unit = args[["unit"]]) : object 'refFlat' not found recover called non-interactively; frames dumped, use debugger() to view Error: object 'geneList' not found recover called non-interactively; frames dumped, use debugger() to view

Warning message: In sink() : no sink to remove Error in save(metal, refFlat, ld, recrate, refSnpPos, barplot_data, fmregions, : objects 'refFlat', 'ld', 'barplot_data', 'fmregions', 'gwas_hits', 'bed_tracks' not found recover called non-interactively; frames dumped, use debugger() to view [1] "Thu Feb 11 15:34:25 2021" Deleting temporary files.. Time required: 0d:0h:35m:12s ........................................................................................................................................

Here is the header of my LD file, columns separated with single white space. dprime column has a header but is empty. I have 3454 rows, only rs numbers and valid rsquare values that range between 0-1. snp2 column includes only the reference snp that I use with the --refsnp option ......................................... snp1 snp2 dprime rsquare rs11.. rs57.. 0.029835307441 ......................................... Summary statistics file looks like this, tab delimited, only rs numbers and P-values. Some of these variants are not found in the database. There are 45465 rows. No missing, double ID or NA values. ....................................... MarkerName P-value rs18.. 3.26e-5 .......................................

Do you think I should update the databases? I run this script on a cluster. I get error if I do not load R. I also make sure to "module load R" (v3.6.1) before I start this. And the python that locuszoom should see by default is Python 2.7.5.

welchr commented 3 years ago

Ahh, I think maybe I see the problem. The dprime column cannot be empty, it must at least have some value (which could be all NA, if it does not exist.) It cannot be the empty string, because the delimiter is any whitespace (\s+), unfortunately.

frahimov commented 3 years ago

Yes! That was the problem. Replacing blanks with NA in dprime solved the problem. Thank you again for your help.