jon-xu / scSplit

Genotype-free demultiplexing of pooled single-cell RNA-Seq, using a hidden state model for identifying genetically distinct samples within a mixed population.
MIT License
39 stars 9 forks source link

Error when getting genotypes #8

Closed drneavin closed 4 years ago

drneavin commented 4 years ago

Hi Jon,

I was recently trying to get the SNP genotypes (python scSplit genotype -r ref_filtered.csv -a alt_filtered.csv -p scSplit_P_s_c.csv) for the individuals after running scSplit (but I am getting an error:

Traceback (most recent call last):
  File "miniconda3/envs/scSplit_w7/lib/python3.7/site-packages/scSplit/scSplit", line 642, in <module>
    scSplit()
  File "miniconda3/envs/scSplit_w7/lib/python3.7/site-packages/scSplit/scSplit", line 354, in __init__
    getattr(self, args.command)()
  File "miniconda3/envs/scSplit_w7/lib/python3.7/site-packages/scSplit/scSplit", line 580, in genotype
    lp_d_rr = pd.DataFrame(binom.pmf(pd.DataFrame(alt_s.dot(A_s_c)), pd.DataFrame((alt_s + ref_s).dot(A_s_c)), err), index=all_POS, columns=range(num)).apply(np.log10)
  File ".local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 2952, in pmf
    goodargs = argsreduce(cond, *((k,)+args))
  File ".local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 545, in argsreduce
    return [np.extract(cond, arr1 * expand_arr) for arr1 in newargs]
  File ".local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 545, in <listcomp>
    return [np.extract(cond, arr1 * expand_arr) for arr1 in newargs]
  File ".local/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 1488, in f
    other = _align_method_FRAME(self, other, axis)
  File ".local/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 1427, in _align_method_FRAME
    right = to_series(right)
  File ".local/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 1419, in to_series
    msg.format(req_len=len(left.columns), given_len=len(right))
ValueError: Unable to coerce to Series, length must be 6: given 1

Could you advise on what might be going on here?

Thanks, Drew

jon-xu commented 4 years ago

Hi Drew,

Great that you finished running the first two steps correctly.

For the last step, I have not experienced the issue before as you mentioned here.

Maybe the fastest way is to share your input files with me and I’ll have a look?

Cheers, Jon

drneavin commented 4 years ago

Sounds great Jon, I'll send them through to you directly so you can have a look.

Cheers, Drew

jon-xu commented 4 years ago

Hi Drew,

Thanks for sharing with me your input files for "scSplit genotype"!

I ran it with "scSplit genotype -r ref_filtered_count20.csv -a alt_filtered_count20.csv -p scSplit_P_s_c.csv" and got the result vcf successfully.

I didn't remember I changed anything in recent releases, but could you still please download the most recent release v1.0.2 and see if you still had this issue?

If yes, we might need to further test on different python package versions.

Thanks! Jon

drneavin commented 4 years ago

Hi Jon,

I have now reinstalled scSplit trying either pip or git but I receive the same errors. I am working on a cluster and am using conda environments. I have listed the packages and their versions below for reference. Do these all match with the versions that you are using? Any differences that I can test out?

# Name                    Version                   Build  Channel
bcftools                  1.9                  ha228f0b_4    bioconda
bzip2                     1.0.6             h14c3975_1002    quansight-small-test
ca-certificates           2019.11.28           hecc5488_0    conda-forge
certifi                   2019.11.28               py37_0    conda-forge
curl                      7.65.3               hf8cf82a_0    conda-forge
htslib                    1.9                  ha228f0b_7    bioconda
krb5                      1.16.3            h05b26f9_1001    conda-forge
libblas                   3.8.0               14_openblas    conda-forge
libcblas                  3.8.0               14_openblas    conda-forge
libcurl                   7.65.3               hda55be3_0    conda-forge
libdeflate                1.0                  h14c3975_1    bioconda
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1             he1b5a44_1006    quansight-small-test
libgcc-ng                 8.2.0                hdf63c60_1    quansight-small-test
libgfortran-ng            7.3.0                hdf63c60_2    conda-forge
liblapack                 3.8.0               14_openblas    conda-forge
libopenblas               0.3.7                h5ec1e0e_4    conda-forge
libssh2                   1.8.2                h1ba5d50_0  
libstdcxx-ng              8.2.0                hdf63c60_1    quansight-small-test
ncurses                   6.1               hf484d3e_1002    quansight-small-test
numpy                     1.17.3           py37h95a1406_0    conda-forge
openssl                   1.1.1d               h516909a_0    conda-forge
pandas                    0.25.3           py37hb3f55d8_0    conda-forge
pip                       19.1                     py37_0    quansight-small-test
pysam                     0.15.3           py37h5ad169c_0    bioconda
python                    3.7.3                h5b0a415_0    quansight-small-test
python-dateutil           2.8.1                      py_0    conda-forge
pytz                      2019.3                     py_0    conda-forge
pyvcf                     0.6.8                 py37_1000    conda-forge
readline                  7.0               hf8c457e_1001    quansight-small-test
samtools                  1.9                 h10a08f8_12    bioconda
scipy                     1.3.2            py37h921218d_0    conda-forge
setuptools                41.0.1                   py37_0    quansight-small-test
six                       1.13.0                   py37_0    conda-forge
sqlite                    3.26.0            h67949de_1001    quansight-small-test
tk                        8.6.9             h84994c4_1001    quansight-small-test
wheel                     0.33.4                   py37_0    quansight-small-test
xz                        5.2.4             h14c3975_1001    quansight-small-test
zlib                      1.2.11            h14c3975_1004    quansight-small-test

I can import all dependent modules in python as well.

It seems to be an error with how the P_s_c.csv file is being handled since the error indicates that the length must be 6 (which is the same as the number of clusters I expect in this pool = 5 donors + doublets) but given 1.

jon-xu commented 4 years ago

Hi Drew,

I have tested it in a similar environment as you have, and yes, it went into error.

It seems scipy.stat.binom.pmf has a new requirement on one of the inputs (p) in the new versions.

I have amended the code to be adaptive to both your and my environment. Please download the newest script (scSplit) from github (not in any release yet) and try again.

Thanks for pointing this out! Jon

drneavin commented 4 years ago

Great, that fixed the issue. Thanks for the quick responses!

jon-xu commented 4 years ago

Hi Drew,

You might want to try the newest release v1.0.4, which has some further improvements on the result.

Cheers, Jon

trebbiano commented 4 years ago

Hi Jon, I'm experiencing the same issue with v1.0.4. The same issue comes up with files from different scSplit run outputs. scSplit run ends without error but shows a warning:

/gpfs/home/user1/.local/lib/python3.6/site-packages/scSplit/scSplit:280: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  alt_or_ref = alt_or_ref.ix[[x for x in alt_or_ref.index if x[0] not in ['X','Y','MT']]]

scSplit genotype error message:

File "/gpfs/home/user1/.local/lib/python3.6/site-packages/pandas/core/ops/__init__.py", line 1419, in to_series
    msg.format(req_len=len(left.columns), given_len=len(right))
ValueError: Unable to coerce to Series, length must be 4: given 1

Dependencies, running on CentOS 6.7:

Python 3.6.3
pandas==0.25.3
pandas-summary==0.0.41
pysam==0.15.3
scipy==1.4.1
scikit-image==0.15.0
scikit-learn==0.21.3
scikits.bootstrap==1.0.0
sklearn==0.0
sklearn-pandas==1.8.0
PyVCF==0.6.8
statistics==1.0.3.5

Suggestions would be appreciated!

jon-xu commented 4 years ago

@trebbiano thanks for your interest in our tool. I have double checked in release v1.0.4, the deprecating .ix has alreay been changed.

To quarantine the version issue, could you please help to re-download the newest release and try again?

Thanks! Jon

trebbiano commented 4 years ago

Thanks for the quick reply! I first tried to update using pip but the version is already reporting as latest:

$ python3.6 -m pip install --upgrade scSplit
Defaulting to user installation because normal site-packages is not writeable
Requirement already up-to-date: scSplit in /gpfs/home/jaroslav/.local/lib/python3.6/site-packages (1.0.4)

so I cloned the Github repo and ran scSplit genotype using the same parameters as before but with the cloned version of scSplit. This time the program completed successfully. So perhaps the pip distributed version is not the same as the Github version despite the version number?

Thanks, Jerry

jon-xu commented 4 years ago

Hi Jerry,

Thanks for the information!

I’ll have a check on pypi!

Cheers,

Jon