uqrmaie1 / admixtools

https://uqrmaie1.github.io/admixtools
71 stars 14 forks source link

Problem making f2 file from plink .bed .fam and .bim files #74

Open sbird1000 opened 1 month ago

sbird1000 commented 1 month ago

I'm a first time user of Admixtools2. I have been trying to create the f2 file using 'extract_f2' but no luck. Here are various command line combos I've tried, so far with no luck:

I entered: prefix = '~/plink2/EAS006uc' my_f2_dir = '~/F2/' extract_f2(prefix, my_f2_dir)

This is what I got back:

ℹ Reading allele frequencies from PLINK files... ℹ EAS006uc.bed has 1 samples and 2503505 SNPs ℹ Calculating allele frequencies from 1 samples in 1 populations ℹ Expected size of allele frequency data: 600 MB 2503k SNPs read... ✔ 2503505 SNPs read in total Error in discard_snps(snpdat, maxmiss = maxmiss, auto_only = auto_only, : Could not parse chromosome numbers! Set 'auto_only = FALSE' to ignore chromosome labels! In addition: Warning message: In discard_snps(snpdat, maxmiss = maxmiss, auto_only = auto_only, : Keeping only chromosomes 1 to 22! Set auto_only = FALSE to keep all chromosomes!

I then tried the same command syntax but with the 'auto_only = FALSE" included and got this:

prefix = '~/plink2/EAS006uc' my_f2_dir = '~/F2/' extract_f2(prefix,'auto_only = FALSE', my_f2_dir)

The result was:

ℹ Reading allele frequencies from PLINK files... Error in match_samples(fam$X2, fam$X1, inds, pops) : Individuals missing in indfile: ~/F2/

Why is 'extract_f2' looking for a file called "indfile" when it was not looking for that file in the first run? Rather than .bed .bin and .fam, files, it appears to be looking for files perhaps associated with the EIGENSTRAT format. Why won't it recognize the Plink format files? Do I need to add some other function or flag to my input code?

Thanks for your help!

sbird1000 commented 1 month ago

I did a third run, removed the ' marks from around auto_only = FALSE and got this result:

prefix = '~/plink2/EAS006uc' my_f2_dir = '~/F2/' extract_f2(prefix,auto_only = FALSE, my_f2_dir)

ℹ Reading allele frequencies from PLINK files... ℹ EAS006uc.bed has 1 samples and 2503505 SNPs ℹ Calculating allele frequencies from 1 samples in 1 populations ℹ Expected size of allele frequency data: 600 MB 2503k SNPs read... ✔ 2503505 SNPs read in total Error in extract_f2(prefix, auto_only = FALSE, my_f2_dir) : There are no informative SNPs!

Just some extra information.

uqrmaie1 commented 1 month ago

Your PLINK files might not be formatted as described here. If you can share the first few lines of the .bim and .fam file, I might be able to see what the problem is. One thing I can tell from the log messages is that it's only recognizing a single sample. If there is indeed only one sample or one population in your data, then no f-statistics can be computed; at least two populations with one sample each are needed to do anything.

"indfile" is just the variable name used in some places for the .fam or the .ind file, it doesn't mean it's expecting EIGENSTRAT format.

I would generally avoid putting named arguments before positional arguments like in extract_f2(prefix,auto_only = FALSE, my_f2_dir). I think R will interpret it correctly in this case, but generally the argument position matters for unnamed arguments, so better to write extract_f2(prefix, my_f2_dir, auto_only = FALSE)

sbird1000 commented 1 month ago

Thank you very much for your help! If I can use just a single sample in one file and a second file containing a population with which to compare it, it would be sufficient for my purposes at this time. Let me give it a try and see what I can do with it.

Just a little background: I am a former university professor (orchestra conductor), with a DMA from the University of Texas at Austin. About 15 years ago I became very interested in human population genetics and returned to school to earn a master's degree in biology (emphasis on population genetics) from Texas State University in 2012. Since 2013 I have been with the Texas Department of State Health Services in Austin. With the recent explosion of aDNA samples now available publicly, I am specifically interested in comparing aDNA male samples that fall into subclades of E-V13, a haplogroup found most abundantly today in the Balkan peninsula, with other potential source populations. I hope to use these samples to better understand the migration of this particular haplogroup from the Balkans to the rest of Europe.

Once again, thanks for your help.

Steve Bird 3765 Castle Rock Dr


From: Robert Maier @.> Sent: Tuesday, June 4, 2024 7:01 PM To: uqrmaie1/admixtools @.> Cc: sbird1000 @.>; Author @.> Subject: Re: [uqrmaie1/admixtools] Problem making f2 file from plink .bed .fam and .bim files (Issue #74)

Your PLINK files might not be formatted as described herehttps://uqrmaie1.github.io/admixtools/articles/io.html. If you can share the first few lines of the .bim and .fam file, I might be able to see what the problem is. One thing I can tell from the log messages is that it's only recognizing a single sample. If there is indeed only one sample or one population in your data, then no f-statistics can be computed; at least two populations with one sample each are needed to do anything.

"indfile" is just the variable name used in some places for the .fam or the .ind file, it doesn't mean it's expecting EIGENSTRAT format.

I would generally avoid putting named arguments before positional arguments like in extract_f2(prefix,auto_only = FALSE, my_f2_dir). I think R will interpret it correctly in this case, but generally the argument position matters for unnamed arguments, so better to write extract_f2(prefix, my_f2_dir, auto_only = FALSE)

— Reply to this email directly, view it on GitHubhttps://github.com/uqrmaie1/admixtools/issues/74#issuecomment-2148599351, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BITDDKXSKJNY5RJT24UYSBTZFZINRAVCNFSM6AAAAABH6WTSMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYGU4TSMZVGE. You are receiving this because you authored the thread.Message ID: @.***>

uqrmaie1 commented 1 month ago

I'm glad you're finding Admixtools 2 useful! One of my goals when working on it was to make these methods a bit more accessible to non-experts. But I realize it's still challenging.

A single sample per file won't work. It's necessary that all samples are in the same set of files. You can use the PLINK --merge-list option to merge the file sets. Feel free to email me if you run into other issues!

sbird1000 commented 3 weeks ago

Hi Robert,

I've had a chance to play around with ADMIXTOOLS2 over the weekend and had some questions about its operation. I am using the admixtools::run_shiny_admixtools() shell to carry out operations. For a test file, I am using the aDNA set from Olalde 2023, available here:

https://reich.hms.harvard.edu/datasets

I was able to produce the qpGraph once, but have not been able to do it again. If you could look at these screen shots, perhaps you may be able to let me know where I'm going wrong?

I loaded the "Select data directory" with this path to the following folder:

~/admix2_data/f2_Olalde/

(I had created the folder; it was empty at the start.)

After that, I clicked on "Select .ind or .fam file" and selected:

~/admix2_data/Olalde_2023/1240K_1kCE_Balkan.ind

The screen displayed the individual samples by population (see attachment Screenshot 1 - ADMIXTOOLS2.png)

and the geno file:

~/admix2_data/Olalde_2023/1240K_1kCE_Balkan.geno

However, when I hit the "Extract Data" button, it seemed to run for a while and then I got the following screen instead: (Screenshot 2 - ADMIXTOOLS2.png)

(The screenshots are attached as png files.)

Here is the command line information from the terminal output:

Listening on http://127.0.0.1:6133 Warning: Good news! You don't need to call useShinyalert() anymore. Please remove this line from your code. If you really want to pre-load {shinyalert} to the UI for any reason, use:       useShinyalert(force = TRUE) [1] "navbar detected" [1] "f2_options" [1] "navbar selected" [1] "f2_options" [1] "navbar:" [1] "f2_options" [1] "multiprocess" [1] "dashboardbody" [1] "expanded!" [1] "data" [1] "navbar == data!" [1] "load_data" [1] "global$bod change detected!" Warning: The 'plotly_click' event tied a source ID of 'src' is not registered. In order to obtain this event data, please add event_register(p, 'plotly_click') to the plot (p) that you wish to obtain event data from. [1] "global$iscountdata" [1] FALSE [1] "global$allinds" [1] "CEE_EarlyMedieval" "Croatia_Brekinjova" [3] "Croatia_Cepinski" "Croatia_Gardun" [5] "Croatia_Gornji" "Croatia_Jagodnjak" [7] "Croatia_MalaMetaljka" "Croatia_Nustar" [9] "Croatia_Osijek" "Croatia_Sipar" [11] "Croatia_SisakPogorelec" "Croatia_TrogirDobric" [13] "Croatia_TrogirDragulin" "Croatia_Zadar" [15] "Serbia_Gomolava" "Serbia_Kormadin" [17] "Serbia_Lepenski" "Serbia_Mediana" [19] "Serbia_TimacumKuline" "Serbia_TimacumSlog" [21] "Serbia_ViminaciumGrobalja" "Serbia_ViminaciumPecine" [23] "Serbia_ViminaciumPirivoj" "Serbia_ViminaciumRit" [25] "Serbia_ViminaciumRudine" "Serbia_ViminaciumSvetinja" [1] "popselectors" [1] "CEE_EarlyMedieval" "Croatia_Brekinjova" [3] "Croatia_Cepinski" "Croatia_Gardun" [5] "Croatia_Gornji" "Croatia_Jagodnjak" [7] "Croatia_MalaMetaljka" "Croatia_Nustar" [9] "Croatia_Osijek" "Croatia_Sipar" [11] "Croatia_SisakPogorelec" "Croatia_TrogirDobric" [13] "Croatia_TrogirDragulin" "Croatia_Zadar" [15] "Serbia_Gomolava" "Serbia_Kormadin" [17] "Serbia_Lepenski" "Serbia_Mediana" [19] "Serbia_TimacumKuline" "Serbia_TimacumSlog" [21] "Serbia_ViminaciumGrobalja" "Serbia_ViminaciumPecine" [23] "Serbia_ViminaciumPirivoj" "Serbia_ViminaciumRit" [25] "Serbia_ViminaciumRudine" "Serbia_ViminaciumSvetinja" [1] "popselectors2" [1] "global$iscountdata" [1] FALSE [1] "dashboardbody" [1] "expanded!" NULL [1] "navbar == data!" [1] "load_data" Warning in dir.create(input$textdirinput) : cannot create dir '', reason 'No such file or directory' Warning in normalizePath(input$textdirinput) : path[1]="": No such file or directory [1] "global$iscountdata" [1] FALSE [1] "popselectors" [1] "CEE_EarlyMedieval" "Croatia_Brekinjova" [3] "Croatia_Cepinski" "Croatia_Gardun" [5] "Croatia_Gornji" "Croatia_Jagodnjak" [7] "Croatia_MalaMetaljka" "Croatia_Nustar" [9] "Croatia_Osijek" "Croatia_Sipar" [11] "Croatia_SisakPogorelec" "Croatia_TrogirDobric" [13] "Croatia_TrogirDragulin" "Croatia_Zadar" [15] "Serbia_Gomolava" "Serbia_Kormadin" [17] "Serbia_Lepenski" "Serbia_Mediana" [19] "Serbia_TimacumKuline" "Serbia_TimacumSlog" [21] "Serbia_ViminaciumGrobalja" "Serbia_ViminaciumPecine" [23] "Serbia_ViminaciumPirivoj" "Serbia_ViminaciumRit" [25] "Serbia_ViminaciumRudine" "Serbia_ViminaciumSvetinja" [1] "popselectors2" [1] "global$iscountdata" [1] FALSE [1] "popselectors" [1] "CEE_EarlyMedieval" "Croatia_Brekinjova" [3] "Croatia_Cepinski" "Croatia_Gardun" [5] "Croatia_Gornji" "Croatia_Jagodnjak" [7] "Croatia_MalaMetaljka" "Croatia_Nustar" [9] "Croatia_Osijek" "Croatia_Sipar" [11] "Croatia_SisakPogorelec" "Croatia_TrogirDobric" [13] "Croatia_TrogirDragulin" "Croatia_Zadar" [15] "Serbia_Gomolava" "Serbia_Kormadin" [17] "Serbia_Lepenski" "Serbia_Mediana" [19] "Serbia_TimacumKuline" "Serbia_TimacumSlog" [21] "Serbia_ViminaciumGrobalja" "Serbia_ViminaciumPecine" [23] "Serbia_ViminaciumPirivoj" "Serbia_ViminaciumRit" [25] "Serbia_ViminaciumRudine" "Serbia_ViminaciumSvetinja" [1] "popselectors2"

At that point, it ceased to function, or at least I didn't know how to proceed from that point.

I really appreciate your help!

Steve Bird


From: Robert Maier @.> Sent: Wednesday, June 5, 2024 11:48 AM To: uqrmaie1/admixtools @.> Cc: sbird1000 @.>; Author @.> Subject: Re: [uqrmaie1/admixtools] Problem making f2 file from plink .bed .fam and .bim files (Issue #74)

I'm glad you're finding Admixtools 2 useful! One of my goals when working on it was to make these methods a bit more accessible to non-experts. But I realize it's still challenging.

A single sample per file won't work. It's necessary that all samples are in the same set of files. You can use the PLINK --merge-list option to merge the file sets. Feel free to email me if you run into other issues!

— Reply to this email directly, view it on GitHubhttps://github.com/uqrmaie1/admixtools/issues/74#issuecomment-2150517183, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BITDDKVADVROKFQQQPZH62TZF46NTAVCNFSM6AAAAABH6WTSMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQGUYTOMJYGM. You are receiving this because you authored the thread.

sbird1000 commented 3 weeks ago

Hello again,

I tried to run the program again and this time clicked on the f2 button despite the screen showing in Screenshot2 in my previous email. I then went to the terminal and found some additional information:

[1] "popselectors2" [1] "global$bod change detected!" [1] "f2plot" [1] "get f2" [1] "get f2" ℹ Reading precomputed data for 26 populations... ℹ Reading f2 data for pair 351 out of 351... Warning in f2_from_precomp(global$countdir, inds = inds, pops = pops, apply_corr = input$f2corr, : Some f2-statistic estimates are negative across blocks. This is probably caused by too many missing or rare SNPs in populations with low sample size, which makes the f2 bias correction unreliable. Consider running 'f2_from_precomp' with 'afprod = TRUE'. [1] "get f2 done" Warning: Error in pivot_longer: Arguments in ... must be used. ✖ Problematic argument: • ..1 = "pop2" ℹ Did you misspell an argument name? 129: 128: signalCondition 127: signal_abort 126: action 124: action_dots 123: check_dots 122: 121: pivot_longer 120: %>% 119: [/home/steve/R/x86_64-pc-linux-gnu-library/4.1/admixtools/shiny_admixtools/app.R#1938] 103: plotly_f2 102: renderPlotly [/home/steve/R/x86_64-pc-linux-gnu-library/4.1/admixtools/shiny_admixtools/app.R#2061] 101: func 98: shinyRenderWidget 97: func 84: renderFunc 83: output$f2heatmap 2: shiny::runApp 1: admixtools::run_shiny_admixtools Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.

Maybe this is helpful. Thanks again for setting me on the right track.

Steve


From: Steven Bird @.> Sent: Monday, June 10, 2024 12:42 PM To: uqrmaie1/admixtools @.> Subject: Re: [uqrmaie1/admixtools] Problem making f2 file from plink .bed .fam and .bim files (Issue #74)

Hi Robert,

I've had a chance to play around with ADMIXTOOLS2 over the weekend and had some questions about its operation. I am using the admixtools::run_shiny_admixtools() shell to carry out operations. For a test file, I am using the aDNA set from Olalde 2023, available here:

https://reich.hms.harvard.edu/datasets

I was able to produce the qpGraph once, but have not been able to do it again. If you could look at these screen shots, perhaps you may be able to let me know where I'm going wrong?

I loaded the "Select data directory" with this path to the following folder:

~/admix2_data/f2_Olalde/

(I had created the folder; it was empty at the start.)

After that, I clicked on "Select .ind or .fam file" and selected:

~/admix2_data/Olalde_2023/1240K_1kCE_Balkan.ind

The screen displayed the individual samples by population (see attachment Screenshot 1 - ADMIXTOOLS2.png)

and the geno file:

~/admix2_data/Olalde_2023/1240K_1kCE_Balkan.geno

However, when I hit the "Extract Data" button, it seemed to run for a while and then I got the following screen instead: (Screenshot 2 - ADMIXTOOLS2.png)

(The screenshots are attached as png files.)

Here is the command line information from the terminal output:

Listening on http://127.0.0.1:6133 Warning: Good news! You don't need to call useShinyalert() anymore. Please remove this line from your code. If you really want to pre-load {shinyalert} to the UI for any reason, use:       useShinyalert(force = TRUE) [1] "navbar detected" [1] "f2_options" [1] "navbar selected" [1] "f2_options" [1] "navbar:" [1] "f2_options" [1] "multiprocess" [1] "dashboardbody" [1] "expanded!" [1] "data" [1] "navbar == data!" [1] "load_data" [1] "global$bod change detected!" Warning: The 'plotly_click' event tied a source ID of 'src' is not registered. In order to obtain this event data, please add event_register(p, 'plotly_click') to the plot (p) that you wish to obtain event data from. [1] "global$iscountdata" [1] FALSE [1] "global$allinds" [1] "CEE_EarlyMedieval" "Croatia_Brekinjova" [3] "Croatia_Cepinski" "Croatia_Gardun" [5] "Croatia_Gornji" "Croatia_Jagodnjak" [7] "Croatia_MalaMetaljka" "Croatia_Nustar" [9] "Croatia_Osijek" "Croatia_Sipar" [11] "Croatia_SisakPogorelec" "Croatia_TrogirDobric" [13] "Croatia_TrogirDragulin" "Croatia_Zadar" [15] "Serbia_Gomolava" "Serbia_Kormadin" [17] "Serbia_Lepenski" "Serbia_Mediana" [19] "Serbia_TimacumKuline" "Serbia_TimacumSlog" [21] "Serbia_ViminaciumGrobalja" "Serbia_ViminaciumPecine" [23] "Serbia_ViminaciumPirivoj" "Serbia_ViminaciumRit" [25] "Serbia_ViminaciumRudine" "Serbia_ViminaciumSvetinja" [1] "popselectors" [1] "CEE_EarlyMedieval" "Croatia_Brekinjova" [3] "Croatia_Cepinski" "Croatia_Gardun" [5] "Croatia_Gornji" "Croatia_Jagodnjak" [7] "Croatia_MalaMetaljka" "Croatia_Nustar" [9] "Croatia_Osijek" "Croatia_Sipar" [11] "Croatia_SisakPogorelec" "Croatia_TrogirDobric" [13] "Croatia_TrogirDragulin" "Croatia_Zadar" [15] "Serbia_Gomolava" "Serbia_Kormadin" [17] "Serbia_Lepenski" "Serbia_Mediana" [19] "Serbia_TimacumKuline" "Serbia_TimacumSlog" [21] "Serbia_ViminaciumGrobalja" "Serbia_ViminaciumPecine" [23] "Serbia_ViminaciumPirivoj" "Serbia_ViminaciumRit" [25] "Serbia_ViminaciumRudine" "Serbia_ViminaciumSvetinja" [1] "popselectors2" [1] "global$iscountdata" [1] FALSE [1] "dashboardbody" [1] "expanded!" NULL [1] "navbar == data!" [1] "load_data" Warning in dir.create(input$textdirinput) : cannot create dir '', reason 'No such file or directory' Warning in normalizePath(input$textdirinput) : path[1]="": No such file or directory [1] "global$iscountdata" [1] FALSE [1] "popselectors" [1] "CEE_EarlyMedieval" "Croatia_Brekinjova" [3] "Croatia_Cepinski" "Croatia_Gardun" [5] "Croatia_Gornji" "Croatia_Jagodnjak" [7] "Croatia_MalaMetaljka" "Croatia_Nustar" [9] "Croatia_Osijek" "Croatia_Sipar" [11] "Croatia_SisakPogorelec" "Croatia_TrogirDobric" [13] "Croatia_TrogirDragulin" "Croatia_Zadar" [15] "Serbia_Gomolava" "Serbia_Kormadin" [17] "Serbia_Lepenski" "Serbia_Mediana" [19] "Serbia_TimacumKuline" "Serbia_TimacumSlog" [21] "Serbia_ViminaciumGrobalja" "Serbia_ViminaciumPecine" [23] "Serbia_ViminaciumPirivoj" "Serbia_ViminaciumRit" [25] "Serbia_ViminaciumRudine" "Serbia_ViminaciumSvetinja" [1] "popselectors2" [1] "global$iscountdata" [1] FALSE [1] "popselectors" [1] "CEE_EarlyMedieval" "Croatia_Brekinjova" [3] "Croatia_Cepinski" "Croatia_Gardun" [5] "Croatia_Gornji" "Croatia_Jagodnjak" [7] "Croatia_MalaMetaljka" "Croatia_Nustar" [9] "Croatia_Osijek" "Croatia_Sipar" [11] "Croatia_SisakPogorelec" "Croatia_TrogirDobric" [13] "Croatia_TrogirDragulin" "Croatia_Zadar" [15] "Serbia_Gomolava" "Serbia_Kormadin" [17] "Serbia_Lepenski" "Serbia_Mediana" [19] "Serbia_TimacumKuline" "Serbia_TimacumSlog" [21] "Serbia_ViminaciumGrobalja" "Serbia_ViminaciumPecine" [23] "Serbia_ViminaciumPirivoj" "Serbia_ViminaciumRit" [25] "Serbia_ViminaciumRudine" "Serbia_ViminaciumSvetinja" [1] "popselectors2"

At that point, it ceased to function, or at least I didn't know how to proceed from that point.

I really appreciate your help!

Steve Bird


From: Robert Maier @.> Sent: Wednesday, June 5, 2024 11:48 AM To: uqrmaie1/admixtools @.> Cc: sbird1000 @.>; Author @.> Subject: Re: [uqrmaie1/admixtools] Problem making f2 file from plink .bed .fam and .bim files (Issue #74)

I'm glad you're finding Admixtools 2 useful! One of my goals when working on it was to make these methods a bit more accessible to non-experts. But I realize it's still challenging.

A single sample per file won't work. It's necessary that all samples are in the same set of files. You can use the PLINK --merge-list option to merge the file sets. Feel free to email me if you run into other issues!

— Reply to this email directly, view it on GitHubhttps://github.com/uqrmaie1/admixtools/issues/74#issuecomment-2150517183, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BITDDKVADVROKFQQQPZH62TZF46NTAVCNFSM6AAAAABH6WTSMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQGUYTOMJYGM. You are receiving this because you authored the thread.