reworkhow / JWAS.jl

Julia for Whole-genome Analysis Software
http://QTL.rocks
GNU General Public License v2.0
96 stars 44 forks source link

improve get_genotypes() function before line "isGRM = false" #149

Closed zhaotianjing closed 6 months ago

zhaotianjing commented 7 months ago
  1. delete rowID=false in the arguments of get_genotypes() function, because this is only used for matrix input. For internal testing, better to use a Dataframe as input.
  2. when set type for each column, use Float32, instead of Float64 in linefill!(etv,Float32).
  3. for string input (i.e., path), add code to clean the memory of the data variable:
    data = nothing
    GC.gc()

    here we cannot read the matrix directly form path, because the 1st column is string.

  4. clean code for matrix input
  5. for matrix input, also allow Array{Int64,2}, Array{Int32,2} as input
  6. add some comments
  7. improve some format
  8. add argument double_precision which works for both genotypes and GRM.
  9. improve documentation
  10. add a comment to explain why we still choose to use genotypes = Matrix(data[!,2:end]).

Testing code:

using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets

#file path
genofile   = dataset("genotypes.csv")
genotypes  = get_genotypes(genofile,separator=',',method="BayesC");
genotypes.genotypes #Matrix{Float32}

#dataframe
df=CSV.read(genofile,DataFrame)
genotypes  = get_genotypes(df,separator=',',method="BayesC");
genotypes.genotypes

#matrix
ma=Matrix(CSV.read(genofile,DataFrame)[!,2:end])
genotypes  = get_genotypes(ma,separator=',',method="BayesC");
genotypes.genotypes #Matrix{Float32}