ma-compbio / Higashi

single-cell Hi-C, scHi-C, Hi-C, 3D genome, nuclear organization, hypergraph
MIT License
76 stars 10 forks source link

Error in fh_model.prep_dataset() "Pack from sparse mtx to tensors" #43

Closed XiongGZ closed 11 months ago

XiongGZ commented 1 year ago

Hi, @ruochiz I met some problems when packing data from sparse matrices to sparse tensors. What can I do to solve this problem? (In fact, I met another problem before (KeyError: "batch id"), so I comment out the code when constructing config_info: "batch_id": "batch id")

data.txt was downloaded from https://drive.google.com/drive/u/0/folders/1SuzqQ_9dliAmTb-fGprFnN3aZrfWS-Fg

Script:

workdir = "/data/home/ruanlab/xiongguangzhou/01.Clustering/02.FastHigashi/"

import pickle
import numpy as np

label_info = {'name': np.arange(1000), 'age': np.ones(1000)}
pickle.dump(label_info, open(datadir+"label_info.pickle", "wb"))

import pandas as pd
data = pd.read_csv(datadir+"data.txt", sep = "\t", nrows = 5)

config = workdir+"config_m3c_pfc_500Kb.JSON"
config_info = {
    "data_dir": datadir,
    "structured": True,
    "input_format": 'higashi_v1',
    "temp_dir": datadir+"Temp",
    "genome_reference_path": datadir+"hg19.chrom.sizes.txt",
    "cytoband_path": datadir+"cytoBand_hg19.txt",
    "chrom_list": ["chr1", "chr2", "chr3", "chr4", "chr5", 
                   "chr6", "chr7", "chr8", "chr9", "chr10", 
                   "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", 
                   "chr17", "chr18", "chr19", "chr20", "chr21", "chr22"],
    "resolution": 500000,
    "resolution_cell": 500000,
    "resolution_fh": [500000],
    "embedding_name": "test",
    "minimum_distance": 500000,
    "maximum_distance": -1,
    "local_transfer_range": 0,
    "loss_mode": "zinb",
    "dimensions": 128,
    "impute_list": ["chr1", "chr2", "chr3", "chr4", "chr5", 
                   "chr6", "chr7", "chr8", "chr9", "chr10", 
                   "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", 
                   "chr17", "chr18", "chr19", "chr20", "chr21", "chr22"],
    "neighbor_num": 5,
    "cpu_num": 10,
    "gpu_num": 8,
    #"batch_id": "batch id",
    "embedding_epoch": 60,
    "correct_be_impute": True
}

import json
with open(config, "w") as f:
    json.dump(config_info, f, indent = 6)

from higashi.Higashi_wrapper import *
from fasthigashi.FastHigashi_Wrapper import *

higashi_model = Higashi(config)
higashi_model.process_data()

fh_model = FastHigashi(config_path = config,
                       path2input_cache = datadir+"Temp",
                       path2result_dir = workdir, 
                       off_diag = 100,
                       filter = False,
                       do_conv = False,
                       do_rwr = False, 
                       do_col = False,
                       no_col = False)

fh_model.prep_dataset(batch_norm = False)

Both fh_model.prep_dataset() and fh_model.prep_dataset(batch_norm = False), I have tried

Error: image

Version:

List of packages in environment: "/data/home/ruanlab/xiongguangzhou/software/micromamba/mambaforge/envs/fasthigashi"

  Name                 Version       Build                        Channel           
──────────────────────────────────────────────────────────────────────────────────────
  _libgcc_mutex        0.1           conda_forge                  conda-forge       
  _openmp_mutex        4.5           2_gnu                        conda-forge       
  abseil-cpp           20211102.0    hd4dd3e8_0                   anaconda/pkgs/main
  arrow-cpp            8.0.0         py39h60b952e_0               anaconda/pkgs/main
  asciitree            0.3.3         py_2                         anaconda/pkgs/main
  aws-c-common         0.4.57        he6710b0_1                   anaconda/pkgs/main
  aws-c-event-stream   0.1.6         h2531618_5                   anaconda/pkgs/main
  aws-checksums        0.1.9         he6710b0_0                   anaconda/pkgs/main
  aws-sdk-cpp          1.8.185       hce553d0_0                   anaconda/pkgs/main
  bedtools             2.30.0        h7d7f7ad_2                   bioconda          
  biopython            1.78          py39h7f8727e_0               anaconda/pkgs/main
  blas                 1.0           mkl                          anaconda/pkgs/main
  bokeh                3.2.1         py39h2f386ee_0               anaconda/pkgs/main
  boost-cpp            1.73.0        h7f8727e_12                  anaconda/pkgs/main
  bottleneck           1.3.7         py39h389d5f1_0               conda-forge       
  brotli               1.0.9         he6710b0_2                   anaconda/pkgs/main
  brotli-python        1.0.9         py39h5a03fae_9               conda-forge       
  bzip2                1.0.8         h7b6447c_0                   anaconda/pkgs/main
  c-ares               1.19.0        h5eee18b_0                   anaconda/pkgs/main
  ca-certificates      2023.05.30    h06a4308_0                   anaconda/pkgs/main
  certifi              2023.7.22     pyhd8ed1ab_0                 conda-forge       
  cffi                 1.15.1        py39h5eee18b_3               anaconda/pkgs/main
  charset-normalizer   3.2.0         pyhd8ed1ab_0                 conda-forge       
  click                8.0.4         py39h06a4308_0               anaconda/pkgs/main
  cloudpickle          2.2.1         py39h06a4308_0               anaconda/pkgs/main
  contourpy            1.0.5         py39hdb19cb5_0               anaconda/pkgs/main
  cooler               0.9.2         pyh7cba7a3_0                 bioconda          
  cudatoolkit          10.2.89       hfd86e86_1                   anaconda/pkgs/main
  curl                 7.88.1        h37d81fd_2                   anaconda/pkgs/main
  cycler               0.11.0        pyhd3eb1b0_0                 anaconda/pkgs/main
  cython               3.0.0         py39h5eee18b_0               anaconda/pkgs/main
  cytoolz              0.12.0        py39h5eee18b_0               anaconda/pkgs/main
  dask                 2023.6.0      py39h06a4308_0               anaconda/pkgs/main
  dask-core            2023.6.0      py39h06a4308_0               anaconda/pkgs/main
  dbus                 1.13.18       hb2f20db_0                   anaconda/pkgs/main
  dill                 0.3.6         py39h06a4308_0               anaconda/pkgs/main
  distributed          2023.6.0      py39h06a4308_0               anaconda/pkgs/main
  expat                2.4.9         h6a678d5_0                   anaconda/pkgs/main
  fasthigashi          0.1.1         py_0                         ruochiz           
  fbpca                1.0           py_0                         conda-forge       
  filelock             3.9.0         py39h06a4308_0               anaconda/pkgs/main
  fontconfig           2.14.1        hef1e5e3_0                   anaconda/pkgs/main
  fonttools            4.25.0        pyhd3eb1b0_0                 anaconda/pkgs/main
  freetype             2.12.1        h4a9f257_0                   anaconda/pkgs/main
  fsspec               2023.4.0      py39h06a4308_0               anaconda/pkgs/main
  gflags               2.2.2         he6710b0_0                   anaconda/pkgs/main
  giflib               5.2.1         h5eee18b_3                   anaconda/pkgs/main
  glib                 2.69.1        he621ea3_2                   anaconda/pkgs/main
  glog                 0.5.0         h2531618_0                   anaconda/pkgs/main
  gmp                  6.2.1         h295c915_3                   anaconda/pkgs/main
  gmpy2                2.1.2         py39heeb90bb_0               anaconda/pkgs/main
  grpc-cpp             1.46.1        h33aed49_1                   anaconda/pkgs/main
  gst-plugins-base     1.14.1        h6a678d5_1                   anaconda/pkgs/main
  gstreamer            1.14.1        h5eee18b_1                   anaconda/pkgs/main
  h5py                 3.7.0         py39h737f45e_0               anaconda/pkgs/main
  hdf5                 1.10.6        h3ffc7dd_1                   anaconda/pkgs/main
  heapdict             1.0.1         pyhd3eb1b0_0                 anaconda/pkgs/main
  higashi              0.1.1a1       py_0                         ruochiz           
  htslib               1.14          h9093b5e_0                   bioconda          
  icu                  58.2          he6710b0_3                   anaconda/pkgs/main
  idna                 3.4           pyhd8ed1ab_0                 conda-forge       
  importlib-metadata   6.0.0         py39h06a4308_0               anaconda/pkgs/main
  importlib_resources  5.2.0         pyhd3eb1b0_1                 anaconda/pkgs/main
  intel-openmp         2021.4.0      h06a4308_3561                anaconda/pkgs/main
  jinja2               3.1.2         py39h06a4308_0               anaconda/pkgs/main
  joblib               1.2.0         py39h06a4308_0               anaconda/pkgs/main
  jpeg                 9e            h5eee18b_1                   anaconda/pkgs/main
  keyutils             1.6.1         h166bdaf_0                   conda-forge       
  kiwisolver           1.4.4         py39h6a678d5_0               anaconda/pkgs/main
  krb5                 1.20.1        h568e23c_1                   anaconda/pkgs/main
  lcms2                2.12          h3be6417_0                   anaconda/pkgs/main
  ld_impl_linux-64     2.38          h1181459_1                   anaconda/pkgs/main
  libblas              3.9.0         12_linux64_mkl               conda-forge       
  libboost             1.73.0        h28710b8_12                  anaconda/pkgs/main
  libcblas             3.9.0         12_linux64_mkl               conda-forge       
  libclang             10.0.1        default_hb85057a_2           anaconda/pkgs/main
  libcurl              7.88.1        h91b91d3_2                   anaconda/pkgs/main
  libdeflate           1.7           h27cfd23_5                   anaconda/pkgs/main
  libedit              3.1.20221030  h5eee18b_0                   anaconda/pkgs/main
  libev                4.33          h7f8727e_1                   anaconda/pkgs/main
  libevent             2.1.12        h8f2d780_0                   anaconda/pkgs/main
  libffi               3.4.4         h6a678d5_0                   anaconda/pkgs/main
  libgcc-ng            12.2.0        h65d4601_19                  conda-forge       
  libgfortran-ng       13.1.0        h69a702a_0                   conda-forge       
  libgfortran5         13.1.0        h15d22d2_0                   conda-forge       
  libgomp              12.2.0        h65d4601_19                  conda-forge       
  liblapack            3.9.0         12_linux64_mkl               conda-forge       
  libllvm10            10.0.1        hbcb73fb_5                   anaconda/pkgs/main
  libnghttp2           1.52.0        ha637b67_1                   anaconda/pkgs/main
  libnsl               2.0.0         h7f98852_0                   conda-forge       
  libopenblas          0.3.21        h043d6bf_0                   anaconda/pkgs/main
  libpng               1.6.37        hbc83047_0                   anaconda/pkgs/main
  libpq                12.15         h37d81fd_1                   anaconda/pkgs/main
  libprotobuf          3.20.3        he621ea3_0                   anaconda/pkgs/main
  libsqlite            3.42.0        h2797004_0                   conda-forge       
  libssh2              1.10.0        h37d81fd_2                   anaconda/pkgs/main
  libstdcxx-ng         12.2.0        h46fd767_19                  conda-forge       
  libthrift            0.15.0        h0d84882_2                   anaconda/pkgs/main
  libtiff              4.2.0         hecacb30_2                   anaconda/pkgs/main
  libuuid              2.38.1        h0b41bf4_0                   conda-forge       
  libwebp              1.2.4         h11a3e52_1                   anaconda/pkgs/main
  libwebp-base         1.2.4         h5eee18b_1                   anaconda/pkgs/main
  libxcb               1.15          h7f8727e_0                   anaconda/pkgs/main
  libxkbcommon         1.0.1         hfa300c1_0                   anaconda/pkgs/main
  libxml2              2.9.14        h74e7548_0                   anaconda/pkgs/main
  libxslt              1.1.35        h4e12654_0                   anaconda/pkgs/main
  libzlib              1.2.13        h166bdaf_4                   conda-forge       
  llvmlite             0.36.0        py39h612dafd_4               anaconda/pkgs/main
  locket               1.0.0         py39h06a4308_0               anaconda/pkgs/main
  lz4                  4.3.2         py39h5eee18b_0               anaconda/pkgs/main
  lz4-c                1.9.4         h6a678d5_0                   anaconda/pkgs/main
  markupsafe           2.1.1         py39h7f8727e_0               anaconda/pkgs/main
  matplotlib           3.7.0         py39h06a4308_0               anaconda/pkgs/main
  matplotlib-base      3.7.0         py39h417a72b_0               anaconda/pkgs/main
  mkl                  2021.4.0      h06a4308_640                 anaconda/pkgs/main
  mkl-service          2.4.0         py39h7e14d7c_0               conda-forge       
  mkl_fft              1.3.1         py39h0c7bc48_1               conda-forge       
  mkl_random           1.2.2         py39hde0f152_0               conda-forge       
  mpc                  1.1.0         h10f8cd9_1                   anaconda/pkgs/main
  mpfr                 4.0.2         hb69a4c5_1                   anaconda/pkgs/main
  mpmath               1.3.0         py39h06a4308_0               anaconda/pkgs/main
  msgpack-python       1.0.3         py39hd09550d_0               anaconda/pkgs/main
  multiprocess         0.70.14       py39h06a4308_0               anaconda/pkgs/main
  munkres              1.1.4         py_0                         anaconda/pkgs/main
  ncurses              6.4           h6a678d5_0                   anaconda/pkgs/main
  networkx             3.1           py39h06a4308_0               anaconda/pkgs/main
  ninja                1.10.2        h06a4308_5                   anaconda/pkgs/main
  ninja-base           1.10.2        hd09550d_5                   anaconda/pkgs/main
  nspr                 4.35          h6a678d5_0                   anaconda/pkgs/main
  nss                  3.89.1        h6a678d5_0                   anaconda/pkgs/main
  numba                0.53.1        py39ha9443f7_0               anaconda/pkgs/main
  numexpr              2.8.4         py39he184ba9_0               anaconda/pkgs/main
  numpy                1.24.3        py39h14f4228_0               anaconda/pkgs/main
  numpy-base           1.24.3        py39h31eccc5_0               anaconda/pkgs/main
  openssl              1.1.1v        h7f8727e_0                   anaconda/pkgs/main
  opt_einsum           3.3.0         pyhd3eb1b0_1                 anaconda/pkgs/main
  orc                  1.7.4         hb3bc3d3_1                   anaconda/pkgs/main
  packaging            23.1          pyhd8ed1ab_0                 conda-forge       
  pairix               0.3.7         py39h3d4b85c_5               bioconda          
  pandas               2.0.3         py39h40cae4c_1               conda-forge       
  partd                1.2.0         pyhd3eb1b0_1                 anaconda/pkgs/main
  pcre                 8.45          h295c915_0                   anaconda/pkgs/main
  pillow               9.4.0         py39h6a678d5_0               anaconda/pkgs/main
  pip                  23.2.1        pyhd8ed1ab_0                 conda-forge       
  platformdirs         3.10.0        pyhd8ed1ab_0                 conda-forge       
  ply                  3.11          py39h06a4308_0               anaconda/pkgs/main
  pooch                1.7.0         pyha770c72_3                 conda-forge       
  psutil               5.9.0         py39h5eee18b_0               anaconda/pkgs/main
  pyarrow              8.0.0         py39h992f0b0_0               anaconda/pkgs/main
  pybedtools           0.9.0         py39hd65a603_2               bioconda          
  pycparser            2.21          pyhd3eb1b0_0                 anaconda/pkgs/main
  pyfaidx              0.7.2.1       pyh7cba7a3_1                 bioconda          
  pynndescent          0.5.10        py39h06a4308_0               anaconda/pkgs/main
  pyparsing            3.0.9         py39h06a4308_0               anaconda/pkgs/main
  pyqt                 5.15.7        py39h6a678d5_1               anaconda/pkgs/main
  pyqt5-sip            12.11.0       py39h6a678d5_1               anaconda/pkgs/main
  pysam                0.17.0        py39h051187c_0               bioconda          
  pysocks              1.7.1         pyha2e5f31_6                 conda-forge       
  python               3.9.16        h7a1cb2a_2                   anaconda/pkgs/main
  python-dateutil      2.8.2         pyhd3eb1b0_0                 anaconda/pkgs/main
  python-lmdb          1.4.1         py39h6a678d5_0               anaconda/pkgs/main
  python-tzdata        2023.3        pyhd8ed1ab_0                 conda-forge       
  python_abi           3.9           1_cp39                       conda-forge       
  pytorch              1.8.0         py3.9_cuda10.2_cudnn7.6.5_0  pytorch           
  pytz                 2023.3        pyhd8ed1ab_0                 conda-forge       
  pyvcf3               1.0.3         pyhdfd78af_0                 bioconda          
  pyyaml               6.0           py39h5eee18b_1               anaconda/pkgs/main
  qt-main              5.15.2        h327a75a_7                   anaconda/pkgs/main
  qt-webengine         5.15.9        hd2b0992_4                   anaconda/pkgs/main
  qtwebkit             5.212         h4eab89a_4                   anaconda/pkgs/main
  re2                  2022.04.01    h295c915_0                   anaconda/pkgs/main
  readline             8.2           h5eee18b_0                   anaconda/pkgs/main
  requests             2.31.0        pyhd8ed1ab_0                 conda-forge       
  scikit-learn         1.3.0         py39h1128e8f_0               anaconda/pkgs/main
  scipy                1.11.1        py39h6183b62_0               conda-forge       
  seaborn              0.12.2        py39h06a4308_0               anaconda/pkgs/main
  setuptools           68.0.0        pyhd8ed1ab_0                 conda-forge       
  simplejson           3.17.6        py39h7f8727e_0               anaconda/pkgs/main
  sip                  6.6.2         py39h6a678d5_0               anaconda/pkgs/main
  six                  1.16.0        pyhd3eb1b0_1                 anaconda/pkgs/main
  snappy               1.1.9         h295c915_0                   anaconda/pkgs/main
  sortedcontainers     2.4.0         pyhd3eb1b0_0                 anaconda/pkgs/main
  sqlite               3.41.2        h5eee18b_0                   anaconda/pkgs/main
  sympy                1.11.1        py39h06a4308_0               anaconda/pkgs/main
  tbb                  2020.3        hfd86e86_0                   anaconda/pkgs/main
  tblib                1.7.0         pyhd3eb1b0_0                 anaconda/pkgs/main
  threadpoolctl        2.2.0         pyh0d69192_0                 anaconda/pkgs/main
  tk                   8.6.12        h1ccaba5_0                   anaconda/pkgs/main
  toml                 0.10.2        pyhd3eb1b0_0                 anaconda/pkgs/main
  toolz                0.12.0        py39h06a4308_0               anaconda/pkgs/main
  tornado              6.3.2         py39h5eee18b_0               anaconda/pkgs/main
  tqdm                 4.65.0        py39hb070fc8_0               anaconda/pkgs/main
  typing-extensions    4.7.1         hd8ed1ab_0                   conda-forge       
  typing_extensions    4.7.1         pyha770c72_0                 conda-forge       
  tzdata               2023c         h71feb2d_0                   conda-forge       
  umap-learn           0.5.3         py39h06a4308_0               anaconda/pkgs/main
  urllib3              2.0.4         pyhd8ed1ab_0                 conda-forge       
  utf8proc             2.6.1         h27cfd23_0                   anaconda/pkgs/main
  wheel                0.41.0        pyhd8ed1ab_0                 conda-forge       
  xyzservices          2022.9.0      py39h06a4308_1               anaconda/pkgs/main
  xz                   5.2.10        h5eee18b_1                   anaconda/pkgs/main
  yaml                 0.2.5         h7b6447c_0                   anaconda/pkgs/main
  zict                 2.2.0         py39h06a4308_0               anaconda/pkgs/main
  zipp                 3.11.0        py39h06a4308_0               anaconda/pkgs/main
  zlib                 1.2.13        h166bdaf_4                   conda-forge       
  zstd                 1.5.2         hfc55251_7                   conda-forge       
ruochiz commented 1 year ago

The length of vectors in label_info.pickle, is it smaller than the actual number of cells? I believe Lee et al has 4238 cells

XiongGZ commented 1 year ago

Thanks, I just follow tutorials/Lee et al (Higashi+Fast-Higashi).ipynb, maybe I should use the file download from https://drive.google.com/drive/u/0/folders/1YQP1tswzdNj1MJPg2XKh6Z7c7o9zNKpB or reset the parameter value in tutorials/Lee et al (Higashi+Fast-Higashi).ipynb.