lemieuxl / pyGenClean

Automated genetic data clean up procedure in Python.
GNU General Public License v3.0
3 stars 1 forks source link

Error in post-install test run #26

Open willmclaren opened 6 years ago

willmclaren commented 6 years ago

I'm encountering an error while following http://lemieuxl.github.io/pyGenClean/install_linux.html#testing-the-installation

I installed pyGenClean using virtualenv as directed.

Command: run_pyGenClean --conf test.ini --bfile pyGenClean_test_data/1000G_EUR-MXL_Human610-Quad-v1_ H

Content of test.ini:

[1]
script = check_ethnicity
ceu-bfile = check_ethnicity_HapMap_ref_pops_b37/hapmap_CEU_r23a_filtered_b37
yri-bfile = check_ethnicity_HapMap_ref_pops_b37/hapmap_YRI_r23a_filtered_b37
jpt-chb-bfile = check_ethnicity_HapMap_ref_pops_b37/hapmap_JPT_CHB_r23a_filtered_b37
nb-components = 2
multiplier = 1

[2]
script = sex_check

Tail of output:

[2018-02-14 15:01:39 find_outliers INFO] Reading population file
[2018-02-14 15:01:39 find_outliers INFO] Reading MDS file
[2018-02-14 15:01:39 find_outliers INFO] Finding reference population centers
[2018-02-14 15:01:39 find_outliers INFO] Finding outliers
Traceback (most recent call last):
  File "/home/will/software/pyGenClean/bin/run_pyGenClean", line 11, in <module>
    sys.exit(safe_main())
  File "/home/will/software/pyGenClean/local/lib/python2.7/site-packages/pyGenClean/run_data_clean_up.py", line 3581, in safe_main
    main()
  File "/home/will/software/pyGenClean/local/lib/python2.7/site-packages/pyGenClean/run_data_clean_up.py", line 196, in main
    options=options,
  File "/home/will/software/pyGenClean/local/lib/python2.7/site-packages/pyGenClean/run_data_clean_up.py", line 2201, in run_check_ethnicity
    check_ethnicity.main(options)
  File "/home/will/software/pyGenClean/local/lib/python2.7/site-packages/pyGenClean/Ethnicity/check_ethnicity.py", line 259, in main
    out_prefix=args.out,
  File "/home/will/software/pyGenClean/local/lib/python2.7/site-packages/pyGenClean/Ethnicity/check_ethnicity.py", line 358, in find_the_outliers
    find_outliers.main(options)
  File "/home/will/software/pyGenClean/local/lib/python2.7/site-packages/pyGenClean/Ethnicity/find_outliers.py", line 77, in main
    outliers = find_outliers(mds, centers, center_info, args.outliers_of, args)
  File "/home/will/software/pyGenClean/local/lib/python2.7/site-packages/pyGenClean/Ethnicity/find_outliers.py", line 359, in find_outliers
    distances = euclidean_distances(subset_data, centers[label])
  File "/home/will/software/pyGenClean/local/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 223, in euclidean_distances
    X, Y = check_pairwise_arrays(X, Y)
  File "/home/will/software/pyGenClean/local/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 112, in check_pairwise_arrays
    warn_on_dtype=warn_on_dtype, estimator=estimator)
  File "/home/will/software/pyGenClean/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 441, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[-0.04016206  0.02047998].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
lemieuxl commented 6 years ago

This is due to a deprecation of the scikit-learn module. Downgrading it to a version prior to 0.19 should resolve the issue. We are currently using scikit-learn version 0.18.1 in production.

pip uninstall scikit-learn
pip install scikit-learn==0.18.1

I also noticed that some plots generated using matplotlib version 2 are invalid. You should probably downgrade matplotlib to a version prior to 2 (we use version 1.5.3 in production).

We are planning to update pyGenClean to increase efficiency, enable Python 3 compatibility and use most up-to-date modules. Since this is going to take some time, I recommend just downgrading the two modules as a quick work around.

willmclaren commented 6 years ago

Thanks @lemieuxl that worked!

One further question, is using plink 1.9 supported (or will it be)? I just tested using it with the above test run and it seems to work OK, and of course much faster (which is the main benefit of 1.9 vs 1.07)

lemieuxl commented 6 years ago

We wanted to wait for an official release of Plink 2 before testing the compatibility with pyGenClean, but I guess it should be for the most part (since the developers say that version 2 is backward compatible with version 1).

I'll leave the ticket open for now, to remind me to fix the scikit-learn issue when updating pyGenClean.