thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
58 stars 23 forks source link

Potential help error with DArT #172

Closed lukelloydjones closed 1 year ago

lukelloydjones commented 1 year ago

Hi Thierry,

Pierre and I are back reviewing earlier DArTSeq data for crocodiles. I contacted you last year about a different issue. Thanks again for your great package!

I have been trying to get through the filter_rad() process with DArT data in 1row format (I have also tried in 2row).

Here's the command

croc <- filter_rad(data = drt.pth, strata = "dartseq_strata.tsv", interactive.filter = TRUE, output = c("genind","snprelate"), filter.common.markers = FALSE)

Here's the error

######################### completed filter_monomorphic ######################### ################################################################################ ######################### radiator::filter_individuals ######################### ################################################################################ Execution date@time: 20230120@1405 Function call and arguments stored in: radiator_filter_individuals_args_20230120@1405.tsv Interactive mode: on

Step 1. Visualization Step 2. Missingness Step 3. Heterozygosity Step 4. Coverage (if available)

Step 1. Visualization of samples QC

Error in extract_coverage(gds, markers = FALSE) : object 'coverage.info' not found

I mean it makes sense as these data don't report read count info but it feels like it shouldn't stop here given the 'if available' statement. I also get a similar error if I skip this and manually go to filter_mac() function by itself. It feels like filter_mac shouldn't require coverage information but maybe I'm off base here.

The data can hopefully be downloaded from here

https://www.dropbox.com/sh/2f244l1uk88ia33/AAC15FBXCOGC3qvtbJEc3R8Pa?dl=0

Here's the session info

─ Session info ────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.1.2 (2021-11-01) os macOS Big Sur 11.6.2 system x86_64, darwin17.0 ui RStudio language (EN) collate en_AU.UTF-8 ctype en_AU.UTF-8 tz Australia/Brisbane date 2023-01-20 rstudio 2021.09.1+372 Ghost Orchid (desktop) pandoc NA

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────── package version date (UTC) lib source assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.1.0) backports 1.4.1 2021-12-13 [2] CRAN (R 4.1.0) BiocGenerics 0.40.0 2021-10-26 [2] Bioconductor BiocManager 1.30.16 2021-06-15 [2] CRAN (R 4.1.0) Biostrings 2.62.0 2021-10-26 [2] Bioconductor bit 4.0.4 2020-08-04 [2] CRAN (R 4.1.0) bit64 4.0.5 2020-08-30 [2] CRAN (R 4.1.0) bitops 1.0-7 2021-04-24 [2] CRAN (R 4.1.0) broom 1.0.0 2022-07-01 [1] CRAN (R 4.1.2) cachem 1.0.6 2021-08-19 [2] CRAN (R 4.1.0) callr 3.7.0 2021-04-20 [2] CRAN (R 4.1.0) carrier 0.1.0 2018-10-16 [2] CRAN (R 4.1.0) class 7.3-19 2021-05-03 [2] CRAN (R 4.1.2) classInt 0.4-3 2020-04-07 [1] CRAN (R 4.1.0) cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.2) codetools 0.2-18 2020-11-04 [2] CRAN (R 4.1.2) colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.2) crayon 1.5.1 2022-03-26 [1] CRAN (R 4.1.2) data.table 1.14.2 2021-09-27 [2] CRAN (R 4.1.0) DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.0) desc 1.4.0 2021-09-28 [2] CRAN (R 4.1.0) devtools 2.4.3 2021-11-30 [2] CRAN (R 4.1.0) digest 0.6.29 2021-12-01 [2] CRAN (R 4.1.0) dplyr 1.0.8 2022-02-08 [1] CRAN (R 4.1.2) e1071 1.7-9 2021-09-16 [2] CRAN (R 4.1.0) ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.1.0) fansi 1.0.3 2022-03-24 [1] CRAN (R 4.1.2) farver 2.1.0 2021-02-28 [2] CRAN (R 4.1.0) fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.1.0) fs 1.5.2 2021-12-08 [2] CRAN (R 4.1.0) fst 0.9.4 2020-08-27 [2] CRAN (R 4.1.0) future 1.23.0 2021-10-31 [2] CRAN (R 4.1.0) gdsfmt 1.30.0 2021-10-26 [2] Bioconductor generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2) GenomeInfoDb 1.30.0 2021-10-26 [2] Bioconductor GenomeInfoDbData 1.2.7 2021-12-20 [2] Bioconductor GenomicRanges 1.46.1 2021-11-18 [2] Bioconductor ggplot2 3.3.6 2022-05-03 [1] CRAN (R 4.1.2) ggspatial 1.1.5 2021-01-04 [1] CRAN (R 4.1.0) globals 0.14.0 2020-11-22 [2] CRAN (R 4.1.0) glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2) gridExtra 2.3 2017-09-09 [2] CRAN (R 4.1.0) gtable 0.3.0 2019-03-25 [2] CRAN (R 4.1.0) HardyWeinberg 1.7.4 2021-11-26 [2] CRAN (R 4.1.0) hms 1.1.1 2021-09-26 [2] CRAN (R 4.1.0) IRanges 2.28.0 2021-10-26 [2] Bioconductor KernSmooth 2.23-20 2021-05-03 [2] CRAN (R 4.1.2) labeling 0.4.2 2020-10-20 [2] CRAN (R 4.1.0) lattice 0.20-45 2021-09-22 [2] CRAN (R 4.1.2) lifecycle 1.0.1 2021-09-24 [2] CRAN (R 4.1.0) listenv 0.8.0 2019-12-05 [2] CRAN (R 4.1.0) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.1.2) memoise 2.0.1 2021-11-26 [2] CRAN (R 4.1.0) mice 3.14.0 2021-11-24 [2] CRAN (R 4.1.0) munsell 0.5.0 2018-06-12 [2] CRAN (R 4.1.0) nnet 7.3-16 2021-05-03 [2] CRAN (R 4.1.2) parallelly 1.30.0 2021-12-17 [2] CRAN (R 4.1.0) pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2) pkgbuild 1.3.1 2021-12-20 [2] CRAN (R 4.1.2) pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.1.0) pkgload 1.2.4 2021-11-30 [2] CRAN (R 4.1.0) plyr 1.8.7 2022-03-24 [1] CRAN (R 4.1.2) prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.1.0) processx 3.5.2 2021-04-30 [2] CRAN (R 4.1.0) proxy 0.4-26 2021-06-07 [2] CRAN (R 4.1.0) ps 1.6.0 2021-02-28 [2] CRAN (R 4.1.0) purrr 0.3.4 2020-04-17 [2] CRAN (R 4.1.0) R6 2.5.1 2021-08-19 [2] CRAN (R 4.1.0) radiator * 1.2.4 2023-01-20 [1] Github (thierrygosselin/radiator@982b790) Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.1.2) RCurl 1.98-1.6 2022-02-08 [1] CRAN (R 4.1.2) readr 2.1.1 2021-11-30 [2] CRAN (R 4.1.0) remotes 2.4.2 2021-11-30 [2] CRAN (R 4.1.0) rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.2) rprojroot 2.0.2 2020-11-15 [2] CRAN (R 4.1.0) Rsolnp 1.16 2015-12-28 [2] CRAN (R 4.1.0) rstudioapi 0.13 2020-11-12 [2] CRAN (R 4.1.0) S4Vectors 0.32.3 2021-11-21 [2] Bioconductor scales 1.1.1 2020-05-11 [2] CRAN (R 4.1.0) SeqArray 1.34.0 2021-10-26 [2] Bioconductor sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.1.0) sf 1.0-9 2022-11-08 [1] CRAN (R 4.1.2) SNPRelate 1.28.0 2021-10-26 [2] Bioconductor stringi 1.7.6 2021-11-29 [2] CRAN (R 4.1.0) testthat 3.1.1 2021-12-03 [2] CRAN (R 4.1.0) tibble 3.1.6 2021-11-07 [2] CRAN (R 4.1.0) tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.1.2) tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.2) truncnorm 1.0-8 2018-02-27 [2] CRAN (R 4.1.0) tzdb 0.2.0 2021-10-27 [2] CRAN (R 4.1.0) units 0.7-2 2021-06-08 [1] CRAN (R 4.1.0) UpSetR 1.4.0 2019-05-22 [2] CRAN (R 4.1.0) usethis 2.1.5 2021-12-09 [2] CRAN (R 4.1.0) utf8 1.2.2 2021-07-24 [2] CRAN (R 4.1.0) vctrs 0.4.0 2022-03-30 [1] CRAN (R 4.1.2) viridisLite 0.4.0 2021-04-13 [2] CRAN (R 4.1.0) vroom 1.5.7 2021-11-30 [2] CRAN (R 4.1.0) withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.2) XVector 0.34.0 2021-10-26 [2] Bioconductor zlibbioc 1.40.0 2021-10-26 [2] Bioconductor

Any help would be greatly appreciated.

Many thanks,

Luke Lloyd-Jones

thierrygosselin commented 1 year ago

Dear Luke, sorry for the bug, I'll have a look at it today.

thierrygosselin commented 1 year ago

With latest version 1.2.5 updated today both previous and new dataset works. I've tested 1 and 2 rows.

test1 <- radiator::read_dart(data = "Report_DCroc20-5699_SNP_2.csv", strata = "dartseq_strata.tsv")
test2 <- radiator::read_dart(data = "Report_DCroc20-5699_SNP_mapping_2.csv", strata = "dartseq_strata.tsv")

If those 2 tests work try the last one:

test3 <- radiator::filter_rad(data = "Report_DCroc20-5699_SNP_2.csv", strata = "dartseq_strata.tsv")

This dataset, with less samples, looks better than the one you showed me in 2021.

You still have little wet lab problem (list of potential problems is long, but it's usually the person behind the pipette or the DNA quality of the tissue). A couple of samples might have mixed DNA.

The duplicate analysis shows 2 dups only so a lot different than the 2021 dataset. You still have a lot of potential close kin

Because of the varying heterozygosity I would advice doing HWE filtering in filter_rad. Usually when the data is top notch, It's not necessary to study pop gen (you want a pop signal but remove the genotyping errors). But since you'll probably be working on identifying the close kin correctly with kinference, I would suggest a first pass in radiator and proper filtering for this in kinference.

Cheers Thierry I'll send the link to my results of filter_rad so you'll see the thresholds used.

lukelloydjones commented 1 year ago

Thank very much again for your super fast reply really helpful!

Yes, we are digging into these data again which were used for loci selection for DArTCap (the data you saw last year). The reviewer is questioning our filtering on HWE and heterozygosity saying that it was likely limiting for pop gen. We get the argument, but we wanted to limit the data to the individuals and loci of highest quality.

Anyway, thanks again and see how we go.

Cheers,

Luke

From: Thierry Gosselin @.> Reply to: thierrygosselin/radiator @.> Date: Saturday, 21 January 2023 at 6:50 am To: thierrygosselin/radiator @.> Cc: Mr Luke Lloyd-Jones @.>, Author @.***> Subject: Re: [thierrygosselin/radiator] Potential help error with DArT (Issue #172)

With latest version 1.2.5 updated today both previous and new dataset works. I've tested 1 and 2 rows.

test1 <- radiator::read_dart(data = "Report_DCroc20-5699_SNP_2.csv", strata = "dartseq_strata.tsv")

test2 <- radiator::read_dart(data = "Report_DCroc20-5699_SNP_mapping_2.csv", strata = "dartseq_strata.tsv")

If those 2 tests work try the last one:

test3 <- radiator::filter_rad(data = "Report_DCroc20-5699_SNP_2.csv", strata = "dartseq_strata.tsv")

This dataset, with less samples, looks better than the one you showed me in 2021.

You still have little wet lab problem (list of potential problems is long, but it's usually the person behind the pipette or the DNA quality of the tissue). A couple of samples might have mixed DNA.

The duplicate analysis shows 2 dups only so a lot different than the 2021 dataset. You still have a lot of potential close kin

Because of the varying heterozygosity I would advice doing HWE filtering in filter_rad. Usually when the data is top notch, It's not necessary to study pop gen (you want a pop signal but remove the genotyping errors). But since you'll probably be working on identifying the close kin correctly with kinference, I would suggest a first pass in radiator and proper filtering for this in kinference.

Cheers Thierry I'll send the link to my results of filter_rad so you'll see the thresholds used.

— Reply to this email directly, view it on GitHubhttps://github.com/thierrygosselin/radiator/issues/172#issuecomment-1398918598, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACL2HY7WVAOL2DP65WPV7GTWTL3BTANCNFSM6AAAAAAUBDCKKQ. You are receiving this because you authored the thread.Message ID: @.***>