Closed lukelloydjones closed 2 years ago
Dear Luke Sorry for the bug you're experiencing
I'll have a look today Thierry
Hi Thierry,
No problem. Probably something wrong on our end. Thanks very much for looking into it so quickly.
Cheers,
Luke
From: Thierry Gosselin @.> Reply to: thierrygosselin/radiator @.> Date: Wednesday, 22 December 2021 at 9:37 am To: thierrygosselin/radiator @.> Cc: Mr Luke Lloyd-Jones @.>, Author @.***> Subject: Re: [thierrygosselin/radiator] Radiator DArT Counts error (Issue #146)
Dear Luke Sorry for the bug you're experiencing
I'll have a look today Thierry
— Reply to this email directly, view it on GitHubhttps://github.com/thierrygosselin/radiator/issues/146#issuecomment-999166293, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACL2HY4AMAETCB5JXS6OYALUSEFSVANCNFSM5KRGPMOA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.***>
Hi Luke, Could you run the command below and tell me if you have the same results that I get ...
test1 <- radiator::read_dart(data = "SEQ_SNPs_counts_0_Target-two_part_id.csv", strata = "croc_strata_counts.tsv")
Reading DArT file...
Number of blacklisted samples: 1
DArT SNP format: alleles coverage in 2 Rows counts
Generating genotypes and calibrating REF/ALT alleles...
Number of markers recalibrated based on counts of allele read depth: 1393
Generating GDS...
File written: radiator_20211222@0948.gds.rad
Number of chrom: 1
Number of locus: 6125
Number of SNPs: 6760
Number of strata: 10
Number of individuals: 1297
Number of ind/strata:
5b__Coastal-Plains-CA = 288
3__North-East-Cape-York = 19
2__North-West-Cape-York = 459
4__Princess-Charlotte-Bay = 259
1b__Gulf-Plains-ALD = 8
5a__Coastal-Plains-CMC = 2
6b__Fitzroy = 24
1c__Gulf-Plains-NFD = 115
5c__Coastal-Plains-APrR = 110
1d__Gulf-Plains-MGD = 13
Number of duplicate id: 0
Computation time, overall: 27 sec
With the latest push, it should work, re-install radiator for this.
Suggestions for your dataset and ask Pierre for further advices, he knows how to deal with those problems
I suggest running the filter with this, because using common markers between strata is by default: on
test2 <- radiator::filter_rad(
data = "SEQ_SNPs_counts_0_Target-two_part_id.csv",
strata = "croc_strata_counts.tsv",
filter.common.markers = FALSE
)
Do you still want to blacklist markers? (y/n):
y
2 options to blacklist markers based on reproducibility:
1. use the outlier statistic
2. enter your own threshold
Enter the option (1 or 2):
1
Step 2. Filtering markers based individual missingness/genotyping
Do you want to blacklist samples based on missingness ? (y/n): y 2 options to blacklist samples:
And would turn off the rest and wait downstream **filter_rad** to view the figures before choosing the value:
Step 3. Filtering markers based on individual heterozygosity
Do you want to blacklist samples based on heterozygosity ? (y/n): n
Step 4. Filtering markers based on individual's coverage
Do you want to blacklist samples based on TOTAL coverage ? (y/n): n Do you want to blacklist samples based on MEDIAN coverage ? (y/n): n Do you want to blacklist samples based on Interquartile Range (IQR) coverage ? (y/n): n
**6. Coverage**: be careful with uneven coverage between samples, I suspect you have varying quality of DNA and/or wet lab problem with some (all Pierre).
**7. Duplicate samples:** you have technical replicates or duplicates and close-kin in the data (but close kin analysis should be done after removing all outliers based on heterozygosity, otherwise it's artefacts not biological).
**8. You have mixed samples**, very very different DNA quality or biological interesting phenomenon (bottleneck, inbreeding, outbreeding, 2 species, etc). this is observed with the individual heterozygosity (look for the range and bubble size that correlates with missing data). When I see a figure like this I usually bet that 95% is due to wet-lab problems with DNA. But I could be wrong. Pierre white shark dataset was similar.
If you email me I'll send a PDF of a talk I gave in Hobart about the different filtering steps... it's the missing manual.
Cheers
Thierry
Dear Thierry,
We are currently trying to QC a DArT 'counts' file. We are getting the following error at the radiator::filter_individuals step. We think we have a problem with our setup re. package version etc. as Pierre Feutry (not sure you know him but I think so) can run it on his machine. We have spent a bit of time tracing the error but at point where think we may need some help.
Any help would be most appreciated.
Kind regards,
Luke
https://www.dropbox.com/sh/lfxjn3ki4i8e8f1/AADjy1s8ik0oR9X70tGEe2RCa?dl=0