thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
58 stars 23 forks source link

Genomic converter error 'no more individuals in your data' #169

Closed esnielsen closed 1 year ago

esnielsen commented 1 year ago

Hi there,

I am trying to convert a plink file to a bayescan file, using the following code:

radiator::detect_genomic_format(data = "ALL2.bin.bed")

radiator::summary_strata("ALL_2_strata.tsv")

lottia <- genomic_converter( data = "ALL_2_bin.bed", strata = "ALL_2_strata.tsv", output = c("bayescan"))

And I get the following error: "Error in join_strata(): ! No more individuals in your data, check data and strata ID names.."

This error looks like it can't join the strata and input data because they have unequal number of individuals, but the output below shows that they both have 553 individuals, so not sure what else it could be?

Output:

[1] "plink.bed.file" Number of strata: 19 Number of individuals: 553

Number of ind/strata: KR = 30 FR = 30 BB = 30 DB = 30 HP = 30 CR = 30 SB = 29 VN = 30 GP = 30 CA = 30 WA = 30 IP = 30 SH = 30 SC = 30 CB = 30 PB = 19 SR = 29 BT = 29 BA = 27

Number of duplicate id: 0 ################################################################################ ######################### radiator::genomic_converter ########################## ################################################################################ Execution date@time: 20221117@0638 Folder created: -3564_radiator_genomic_converter_20221117@0638 Function call and arguments stored in: radiator_genomic_converter_args_20221117@0638.tsv Filters parameters file generated: filters_parameters_20221117@0638.tsv

Importing data: plink.bed.file Reading PLINK bed file...

Data summary: number of samples: 553 number of markers: 908948 Error in join_strata(): ! No more individuals in your data, check data and strata ID names... Backtrace: x

  1. +-radiator::genomic_converter(...)
  2. | -radiator::tidy_genomic_data(...)
  3. | -radiator::tidy_plink(...)
  4. | -radiator::read_plink(...)
  5. | -... %>% ...
  6. +-dplyr::mutate(...)
  7. +-dplyr::left_join(...)
  8. +-dplyr:::left_join.data.frame(...)
  9. | -dplyr::auto_copy(x, y, copy = copy)
    1. | +-dplyr::same_src(x, y)
    2. | -dplyr:::same_src.data.frame(x, y)
    3. | -base::is.data.frame(y)
    4. +-join_strata(individuals, strata, verbose = verbose) %>% ...
    5. +-dplyr::mutate(., FILTERS = "whitelist")
    6. -radiator::join_strata(individuals, strata, verbose = verbose)
    7. -rlang::abort("No more individuals in your data, check data and strata ID names...")

Computation time, overall: 55 sec

Computation time, overall: 56 sec ######################### completed genomic_converter ########################## Execution halted

I would really appreciate any input you have- thanks in advance!

thierrygosselin commented 1 year ago

Hi Erica, It's likely a problem of individual naming inside the plink or the strata file, but I won't know for sure until I have the data to run the function and reproduce the error. You mind sharing relevant files over email ?

Best Thierry

esnielsen commented 1 year ago

Hi Thierry,

Thanks so much for the quick reply!

The strata file is attached, and here’s a link to the .bed file: https://www.dropbox.com/s/ws87l798ka9vreg/ALL_2.bin.bed?dl=0 https://www.dropbox.com/s/ws87l798ka9vreg/ALL_2.bin.bed?dl=0

Thanks, Erica

On 17 Nov 2022, at 15:51, Thierry Gosselin @.***> wrote:

Hi Erica, It's likely a problem of individual naming inside the plink or the strata file, but I won't know for sure until I have the data to run the function and reproduce the error. You mind sharing relevant files over email ?

Best Thierry

— Reply to this email directly, view it on GitHub https://github.com/thierrygosselin/radiator/issues/169#issuecomment-1318669173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKBISHPH6T7ABR2WSRNDZ6DWIYZ6TANCNFSM6AAAAAASDJ4I3E. You are receiving this because you authored the thread.

thierrygosselin commented 1 year ago

I don't see the strata file

thierrygosselin commented 1 year ago

Do you have the corresponding fam and bim files that goes with the bed file ?

esnielsen commented 1 year ago

Hi, yes all the files should be here: https://www.dropbox.com/scl/fo/dmh7l3ewfyr3wmvn1t78w/h?dl=0&rlkey=n913sdaqiskp57ahm0ivxjow6 https://www.dropbox.com/scl/fo/dmh7l3ewfyr3wmvn1t78w/h?dl=0&rlkey=n913sdaqiskp57ahm0ivxjow6

On 17 Nov 2022, at 17:35, Thierry Gosselin @.***> wrote:

Do you have the corresponding fam and bim files that goes with the bed file ?

— Reply to this email directly, view it on GitHub https://github.com/thierrygosselin/radiator/issues/169#issuecomment-1318815117, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKBISHM2HTTOQH34YIYIKCDWIZGELANCNFSM6AAAAAASDJ4I3E. You are receiving this because you authored the thread.

thierrygosselin commented 1 year ago

ok good I have what I need

thierrygosselin commented 1 year ago

I was able to reproduce the bug, will have a fix this afternoon

esnielsen commented 1 year ago

Great, thanks so much!

On 17 Nov 2022, at 18:12, Thierry Gosselin @.***> wrote:

I was able to reproduce the bug, will have a fix this afternoon

— Reply to this email directly, view it on GitHub https://github.com/thierrygosselin/radiator/issues/169#issuecomment-1318869145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKBISHNRQN2ZIHWRADCH3R3WIZKNNANCNFSM6AAAAAASDJ4I3E. You are receiving this because you authored the thread.

thierrygosselin commented 1 year ago

Found the problem

I don't know the origin of this plink file (how it was generated), but the names inside the bed and fam files don't match. Or it's SeqArray::seqBED2GDS use under the hood that's not working.

Same goes with the strata file which is completely different than the fam file naming scheme.

Some advice on naming strategy, here, but it doesn't have to be that way to work inside radiator.

once this is sorted out try:

test1 <- radiator::read_plink(data = "ALL_2.bin.bed")

if it's not working, don't bother trying genomic_converter, re-open the issue if you're still having problems