mennodejong1986 / SambaR

SambaR: Snp datA Management and Basic Analyses in R
MIT License
24 stars 6 forks source link

mergepop() issue #20

Closed ricohabeahan closed 2 years ago

ricohabeahan commented 2 years ago

Hi Menno,

First of all, thank you so much for this very useful and time saving package. I'm using SambaR to work with ddRADSeq short reads that are aligned to a reference genome. I managed to import the short read data as instructed in the manual.

However, when filtering the data using the command : filterdata(indmiss=0.7,snpmiss=0.05,min_mac=2,dohefilter=TRUE,snpdepthfilter=TRUE,min_spacing=500,nchroms=NULL,silent=TRUE)

I have had several populations which have had no individuals retained after the filters. I figured the filters were applied per population instead of the read as a whole, so it may be more strict this way as I had limited individuals/population. SambaR didn't accept population with no individuals. This was an issue for me because I wish to analyse population structure with as many population as possible. One of the options SambaR mentioned was to merge populations that had no individual retained into another. I decided it may be better to merge all populations into one, so that the filters applies to the short reads as a whole instead of by population. Later on, I thought I may be able to extract PED/MAP files for this and manually add the population name before going on to the analyses.

I used the mergepop() command to merge all my populations into one population successfully. In the previous version (v1.06), I was able to merge all population into one population and was able to filter the data using the filterdata() command as above. After this, I was also able (using v1.06) to export PED/MAP files of the filtered data using exportsambarfiles().

However in v1.07 I noticed after the mergepop() command to merge all populations into one, that later on functions such as exportsambarfiles() and findstructure() returns an error that I have not seen in the previous version.

The error message I had after the command was as follow:

exportsambarfiles() Currently only 1 population defined. Not exporting Bayesass input ('Bayesassinput.immanc.txt'). Creating input for Treemix... Error in colnames<-(*tmp*, value = popnames) : attempt to set 'colnames' on an object with less than two dimensions findstructure(Kmax=6,add_legend=TRUE,legend_pos ="bottomright",legend_cex=3,symbol_size=3) Redefining order of populations as specified by pop_order flag. Expected population names: 1 ERROR: vector input to pop_order argument is not the same length as vector input to popnames argument.

Perhaps, when SambaR was merging populations, there was an issue in changing the vector length that wasn't observed in the previous version. Is there any way I can apply the filterdata() option to the short reads as a whole instead of by population to avoid the need to use mergepop()?

Sorry for the lengthy explanation. Hope this somewhat makes sense of the issue and thank you for your time.

Rico :)

mennodejong1986 commented 2 years ago

Hi Rico,

I uploaded a new source SambaR script with some small edits to the mergepop function. I think this will solve the error you encountered when trying to run the findstructure function. Thanks for reporting this issue.

I didn't manage yet to reproduce the error you encountered with the exportsambarfiles() function, but if you are not interested in running treemix this can be bypassed as follows: exportsambarfiles(do_treemix=FALSE)

As a more general note, it is not ideal to merge all individuals in one single population, as the population colours help to interpret output plots. The snps filters are always set relative to the entire dataset, not to data subsets (i.e., not per population).

Best,

Menno

ricohabeahan commented 2 years ago

Thanks Menno! The updated source code seems to have fixed the issue with the mergepop() command. I can now run exportsambarfiles(do_treemix=FALSE) with PED/MAP files exported as well as further analyses command.

Best, Rico.