mafeiyang / ACTINN

GNU General Public License v3.0
35 stars 21 forks source link

ValueError: Shape of passed values is (15376, 28211), indices imply (15375, 28211) #1

Open mdurante1 opened 5 years ago

mdurante1 commented 5 years ago

Hello,

I have tested your tool out on the example data that you provided and it seems to work very nicely. I proceeded to run my own data set with the default training set and received good results. I then tried to test the "tcell_subtype" dataset you describe in your manuscript and received the error below. Can you please provide any insight into the source of this error?

Best, Michael

(base) mdurante@hlab4:~/software/ACTINN$ python actinn_format.py -I dataset.txt -o tcell_subset -f txt
Dimension of the matrix after removing non-zero rows: (22430, 16740)
(base) mdurante@hlab4:~/software/ACTINN$ python actinn_predict.py -trs ./test_data/tcell_subtype_ref.h5 -trl ./test_data/tcell_subtype_ref_label.txt -ts ./tcell_subset.h5 -lr 0.0001 -ne 50 -ms 128 -pc True
actinn_predict.py:286: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
  train_label = pd.read_table(args.train_label, header=None)
actinn_predict.py:37: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  total_set = np.array(pd.concat(sets, axis=1), dtype=np.float32)
Traceback (most recent call last):
  File "actinn_predict.py", line 291, in <module>
    train_set, test_set = scale_sets([train_set, test_set])
  File "actinn_predict.py", line 37, in scale_sets
    total_set = np.array(pd.concat(sets, axis=1), dtype=np.float32)
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 229, in concat
    return op.get_result()
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 426, in get_result
    copy=self.copy)
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 2065, in concatenate_block_managers
    return BlockManager(blocks, axes)
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 114, in __init__
    self._verify_integrity()
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 311, in _verify_integrity
    construction_error(tot_items, block.shape[1:], self.axes)
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1691, in construction_error
    passed, implied))
ValueError: Shape of passed values is (15376, 28211), indices imply (15375, 28211)
mafeiyang commented 5 years ago

Hi Michael,

Thanks for trying the tool. It looks like a pandas data frame issue, and one gene in your matrix is causing the problem. Can you remove the genes that are lowly expressed, say, the average nUMI is less than 0.1 and try the tool again? And if you can remove the "NA" in your input matrix, that will be helpful, too.

Best, Feiyang

raph06 commented 5 years ago

Hi, I append to stumbled upon the same issue a couple of days ago. It arose from a duplicate gene in the training dataset (C2ORF15). After removing this gene from the common_gene array. Everything worked smoothly.

Edit: It also append with another dataset and C2ORF15 was the culprit as well. This gene doesn't seem to be duplicated in the input dataset although it is clearly duplicated in sets[0]. This is why scale_sets([train_set, test_set]) function fails to execute properly.

Hope that helps Best Raphael

Weiwen1992 commented 5 years ago

I have the same problem. Turns out there is indeed C2ORF15 duplicate in the training dataset....

mafeiyang commented 5 years ago

Hi All,

Thanks for bringing the problem up. I revised the code to remove the duplicated genes in the datasets. Now we won't get the shape error from pandas dataframe.

Best, Feiyang