nimwegenLab / cellstates

Finding gene expression states in scRNA-seq data
MIT License
48 stars 5 forks source link

Error when running on test data #1

Closed csoneson closed 3 years ago

csoneson commented 3 years ago

Hi - I'm trying to run cellstates on the provided test data, and I'm running into some issues that I was hoping you could help me solve.

The version of the software I used is the one from Feb 20. Installation went without problems.

Next, I was trying to run the software on the provided test data:

python scripts/run_cellstates.py -o test_output/ test_data/simulated_data.tsv.zip

Here, I got some errors related to reading the file, which I think are due to the presence of spaces in the header (e.g., cell_ 1 rather than cell_1):

Traceback (most recent call last):
  File "scripts/run_cellstates.py", line 195, in <module>
    main()
  File "scripts/run_cellstates.py", line 54, in main
    df = df.astype(np.int, copy=False)
  File "cellstates-env/lib/python3.9/site-packages/pandas/core/generic.py", line 5874, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "cellstates-env/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 631, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "cellstates-env/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "cellstates-env/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 673, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "cellstates-env/lib/python3.9/site-packages/pandas/core/dtypes/cast.py", line 1068, in astype_nansafe
    raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
ValueError: Cannot convert non-finite values (NA or inf) to integer

I removed the first 1,000 cells (the ones containing white spaces), and further subset to the first remaining 200 cells to speed things up (I don't think this is the reason for the problem, I got the same issues when running on the full set of cells except the first 1,000). Now, calling run_cellstates.py fails with the following error message:

$ python scripts/run_cellstates.py -o test_output_sub/ test_data/simulated_data_fixed_sub200.tsv
2021-02-20 19:52:11,876 - WARNING:Only 0 moves found within loop limit.Consider raising tries_per_step
Traceback (most recent call last):
  File "scripts/run_cellstates.py", line 195, in <module>
    main()
  File "scripts/run_cellstates.py", line 105, in main
    run_mcmc(clst, N_steps=N, log_level=LOG_LEVEL, tries_per_step=TPS)
  File "cellstates-env/lib/python3.9/site-packages/cellstates-0.1-py3.9-linux-x86_64.egg/cellstates/run.py", line 80, in run_mcmc
    clst.set_clusters(best_clusters)
UnboundLocalError: local variable 'best_clusters' referenced before assignment

I could not find a way to change the tries_per_step variable. Could you provide some insights into what's going on here, and what would be the expected output of running run_cellstates.py on the provided test data?

Thanks!

pgrobecker commented 3 years ago

Hi, thanks a lot for trying out cellstates - your feedback is much appreciated!

I fixed the problem with cell names in the simulated data - they are now cell_0001, cell_0002, etc.

tries_per_step is a variable that is internally set, so it's just part of the normal running of the program - I understand though that it's a very confusing message so it will no longer be shown.

I also fixed the bug that caused your run to fail. However, I should note that the problems were probably caused because all simulated cells you selected for the test run come from the same cluster. You can find the true cluster labels in test_data/simulated_clusters.txt (row 1 is cell_0000, row 2 cell_0002, etc).

Let me know if you still have any issues and thanks again for the feedback.

csoneson commented 3 years ago

Thank you! I tried the new version on the example data and I can confirm that it now runs without errors. I will try it on my own data and get back to you if I run into any other issues.