green-striped-gecko / dartR

Importing and Analysing DArT type snp and silicodart data
GNU General Public License v3.0
31 stars 21 forks source link

Outliers for downstream analysis #81

Open jwhitaker17 opened 5 years ago

jwhitaker17 commented 5 years ago

Hi,

I've run the gl.outflank and was able to produce a report on the outliers in my dataset of 40,746 SNPs. 303 loci were flagged as outliers. I'd like to now subset my data into outliers and non-outliers to run downstream analyses (e.g. PCA, etc.).

However, I've been unable to figure out how to pull out those 303 loci to run downstream analyses. Is this function already available, or do you have recommendations on how to do it? If it can't be done to the gl object, I'm assuming there is a way to add to the info of a vcf file, but it is beyond my abilities. Any advice would be much appreciated.

Please let me know if I need to clarify anything or provide further information. Thanks in advance for your help!

ollybolly commented 5 years ago

Hi,

Have you tried something like:

Index.outflank <- gl.outflank(my.gl) #run outflank on your genlight file and create and index of true false

Index.outflank #have a look sum(index.outflank) #how many?

myspecies.selection.gl <- my.gl[ ,!index.outflank] #the outlier set

myspecies.neutral.gl <- my.gl[ ,index.outflank] #the “neutral” set

I may not have got the syntax totally correct, but this is the general principle of how I’ve subsetted the data based on the index provided by gl.outflank

Cheers,

Olly

From: jwhitaker17 [mailto:notifications@github.com] Sent: Friday, 21 June 2019 6:05 AM To: green-striped-gecko/dartR dartR@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [green-striped-gecko/dartR] Outliers for downstream analysis (#81)

Hi,

I've run the gl.outflank and was able to produce a report on the outliers in my dataset of 40,746 SNPs. 303 loci were flagged as outliers. I'd like to now subset my data into outliers and non-outliers to run downstream analyses (e.g. PCA, etc.).

However, I've been unable to figure out how to pull out those 303 loci to run downstream analyses. Is this function already available, or do you have recommendations on how to do it? If it can't be done to the gl object, I'm assuming there is a way to add to the info of a vcf file, but it is beyond my abilities. Any advice would be much appreciated.

Please let me know if I need to clarify anything or provide further information. Thanks in advance for your help!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/green-striped-gecko/dartR/issues/81?email_source=notifications&email_token=AC66JMNMQK5BE3MJ3KDVTQTP3P5JVA5CNFSM4H2AJXC2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G22LIKA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AC66JMJOFO35O52UDW5K2LDP3P5JVANCNFSM4H2AJXCQ.

jwhitaker17 commented 5 years ago

Thank you! I had to tweak it a little, but this worked. I really appreciate the speedy reply!

Have a great weekend!

green-striped-gecko commented 5 years ago

Hi jwithaker17,

If you run the function like:

out <- gl.outflank(bandicoot.gl)

then the out object contains all the necessary information.

e.g. out$outflank$results returns a table and all the estimates for the locis (all the one that are false are regarded as outliers).

names(out$outflank$results) [1] "LocusName" "He" "FST" [4] "T1" "T2" "FSTNoCorr" [7] "T1NoCorr" "T2NoCorr" "meanAlleleFreq" [10] "indexOrder" "GoodH" "qvalues" [13] "pvalues" "pvaluesRightTail" "OutlierFlag"

So to find the loci names which are outliers you could use:

out$outflank$results$LocusName[out$outflank$results$OutlierFlag==TRUE]

this in turn can be used to “get the loci from the genlight object via

index <- out$outflank$results$OutlierFlag==TRUE

glfiltered <- gl[ , index] #all loci which are outliers…..

hope that helps,

Bernd

Dr Bernd Gruber )/ .--..---"-,--c_ Associate Professor |..' ._O) Tel: (02) 6206 3804 ,=. .+ ..--( / Fax: (02) 6201 2328 \.-''.-' \ ( _ Institute for Applied Ecology '''\ /\ Faculty of Science and Technology ') University of Canberra ACT 2601 AUSTRALIA Email: bernd.gruber@canberra.edu.aumailto:bernd.gruber@canberra.edu.au WWW: bernd-gruberhttps://researchprofiles.canberra.edu.au/en/persons/bernd-gruber

Australian Government Higher Education Provider Number CRICOS #00212K NOTICE & DISCLAIMER: This email and any files transmitted with it may contain confidential or copyright material and are for the attention of the addressee only. If you have received this email in error please notify us by email reply and delete it from your system. The University of Canberra accepts no liability for any damage caused by any virus transmitted by this email.

From: jwhitaker17 [mailto:notifications@github.com] Sent: Friday, 21 June 2019 08:05 To: green-striped-gecko/dartR dartR@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [green-striped-gecko/dartR] Outliers for downstream analysis (#81)

Hi,

I've run the gl.outflank and was able to produce a report on the outliers in my dataset of 40,746 SNPs. 303 loci were flagged as outliers. I'd like to now subset my data into outliers and non-outliers to run downstream analyses (e.g. PCA, etc.).

However, I've been unable to figure out how to pull out those 303 loci to run downstream analyses. Is this function already available, or do you have recommendations on how to do it? If it can't be done to the gl object, I'm assuming there is a way to add to the info of a vcf file, but it is beyond my abilities. Any advice would be much appreciated.

Please let me know if I need to clarify anything or provide further information. Thanks in advance for your help!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/green-striped-gecko/dartR/issues/81?email_source=notifications&email_token=AARRISBC2RUY6WDWCJDMFB3P3P5JVA5CNFSM4H2AJXC2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G22LIKA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AARRISHQKKSYQDDXK76IMHTP3P5JVANCNFSM4H2AJXCQ.

jwhitaker17 commented 5 years ago

Thank you for all your help! I was able to get what I needed. I really appreciate the quick response.

Best,

Justine Whitaker, PhD Assistant Professor 906 East 1st Street 229 Gouaux Hall Nicholls State University Thibodaux, LA 70301 985-493-2628

On Tue, Jun 25, 2019 at 10:40 PM Bernd Gruber notifications@github.com wrote:

Hi jwithaker17,

If you run the function like:

out <- gl.outflank(bandicoot.gl)

then the out object contains all the necessary information.

e.g. out$outflank$results returns a table and all the estimates for the locis (all the one that are false are regarded as outliers).

names(out$outflank$results) [1] "LocusName" "He" "FST" [4] "T1" "T2" "FSTNoCorr" [7] "T1NoCorr" "T2NoCorr" "meanAlleleFreq" [10] "indexOrder" "GoodH" "qvalues" [13] "pvalues" "pvaluesRightTail" "OutlierFlag"

So to find the loci names which are outliers you could use:

out$outflank$results$LocusName[out$outflank$results$OutlierFlag==TRUE]

this in turn can be used to “get the loci from the genlight object via

index <- out$outflank$results$OutlierFlag==TRUE

glfiltered <- gl[ , index] #all loci which are outliers…..

hope that helps,

Bernd

Dr Bernd Gruber )/ .--..---"-,--c_ Associate Professor |..' ._O) Tel: (02) 6206 3804 ,=. .+ ..--( / Fax: (02) 6201 2328 \.-''.-' \ ( _ Institute for Applied Ecology '''\ /\ Faculty of Science and Technology ') University of Canberra ACT 2601 AUSTRALIA Email: bernd.gruber@canberra.edu.aumailto:bernd.gruber@canberra.edu.au WWW: bernd-gruber< https://researchprofiles.canberra.edu.au/en/persons/bernd-gruber>

Australian Government Higher Education Provider Number CRICOS #00212K NOTICE & DISCLAIMER: This email and any files transmitted with it may contain confidential or copyright material and are for the attention of the addressee only. If you have received this email in error please notify us by email reply and delete it from your system. The University of Canberra accepts no liability for any damage caused by any virus transmitted by this email.

From: jwhitaker17 [mailto:notifications@github.com] Sent: Friday, 21 June 2019 08:05 To: green-striped-gecko/dartR dartR@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [green-striped-gecko/dartR] Outliers for downstream analysis (#81)

Hi,

I've run the gl.outflank and was able to produce a report on the outliers in my dataset of 40,746 SNPs. 303 loci were flagged as outliers. I'd like to now subset my data into outliers and non-outliers to run downstream analyses (e.g. PCA, etc.).

However, I've been unable to figure out how to pull out those 303 loci to run downstream analyses. Is this function already available, or do you have recommendations on how to do it? If it can't be done to the gl object, I'm assuming there is a way to add to the info of a vcf file, but it is beyond my abilities. Any advice would be much appreciated.

Please let me know if I need to clarify anything or provide further information. Thanks in advance for your help!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub< https://github.com/green-striped-gecko/dartR/issues/81?email_source=notifications&email_token=AARRISBC2RUY6WDWCJDMFB3P3P5JVA5CNFSM4H2AJXC2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G22LIKA>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AARRISHQKKSYQDDXK76IMHTP3P5JVANCNFSM4H2AJXCQ>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/green-striped-gecko/dartR/issues/81?email_source=notifications&email_token=AMGULSQFBQTYKNRM4PH44SDP4LQKVA5CNFSM4H2AJXC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYSHFDA#issuecomment-505705100, or mute the thread https://github.com/notifications/unsubscribe-auth/AMGULSRVLJYQIGINSSRKR7DP4LQKVANCNFSM4H2AJXCQ .