ratan-lab / sumo

Subtyping tool for multi-omic data
https://pypi.org/project/python-sumo
MIT License
13 stars 1 forks source link

sumo prepare File contains some non-numerical values other than 'NA' #12

Open dfermin opened 4 years ago

dfermin commented 4 years ago

Hello

I've successfully installed sumo and I'm trying to run it on a small data set of 37 samples. Each sample was measured by 4 different omics methods: gene expression, metabolomics, 2 different tissue specific expression analysis (ie: gene expression on select tissue types).

The command I run is:

sumo prepare -plot RES.png forSUMO/MET.tsv.gz,forSUMO/GE1.tsv.gz,forSUMO/GE2.tsv.gz,forSUMO/allGE.tsv.gz prepared.RES.np

But I immediately get this error:

(py36) [dfermin@lnx00005 SUMO]$ sumo prepare -plot RES.png forSUMO/MET.tsv.gz,forSUMO/GE1.tsv.gz,forSUMO/GE2.tsv.gz,forSUMO/allGE.tsv.gz prepared.RES.npz
#Loading file: forSUMO/MET.tsv.gz
Traceback (most recent call last):
  File "/home/dfermin/anaconda3/envs/py36/bin/sumo", line 10, in <module>
    sys.exit(main())
  File "/home/dfermin/anaconda3/envs/py36/lib/python3.6/site-packages/sumo/run.py", line 12, in main
    mode.run()
  File "/home/dfermin/anaconda3/envs/py36/lib/python3.6/site-packages/sumo/modes/prepare/prepare.py", line 102, in run
    layers = self.load_all_data()  # list of tuples (file_name, feature_matrix)
  File "/home/dfermin/anaconda3/envs/py36/lib/python3.6/site-packages/sumo/modes/prepare/prepare.py", line 93, in load_all_data
    drop_samples=self.ds)
  File "/home/dfermin/anaconda3/envs/py36/lib/python3.6/site-packages/sumo/utils.py", line 371, in load_data_text
    raise ValueError("File contains some non-numerical values other than 'NA'")
ValueError: File contains some non-numerical values other than 'NA'

I've checked the MET.tsv.gz file in R and it only contains numbers. There are no missing values. The numbers range from -1.77 to 5.88. This is after standard normalizing the data as per the recommendations given in the pre-processing steps.

Any suggestions? Thanks, Damian

sienkie commented 4 years ago

Hi, thank you for using sumo. This error should appear only if data matrix contains strings or unsupported symbols for missing values instead of NAs. I am going to look into it. It would be helpful if you could provide a sample of your MET.tsv.gz file (for example first 10 lines).