welch-lab / liger

R package for integrating and analyzing multiple single-cell datasets
GNU General Public License v3.0
392 stars 78 forks source link

Error When Using `createLiger` with HDF5 File #317

Closed h4rvey-g closed 4 months ago

h4rvey-g commented 4 months ago

Hi, I encountered an issue while trying to create a Liger object using the createLiger function with HDF5 file.

r$> createLiger(list(name = "data/201.Download_sc/GSE159115/GSM4819725_SI_18854_filtered_gene_bc_matrices_h5.h5"))

HDF5-API Errors:
    error #000: ../../../src/H5L.c in H5Lexists(): line 845: unable to get link info
        class: HDF5
        major: Links
        minor: Can't get value

    error #001: ../../../src/H5L.c in H5L__exists(): line 2932: path doesn't exist
        class: HDF5
        major: Links
        minor: Object already exists

    error #002: ../../../src/H5Gtraverse.c in H5G_traverse(): line 848: internal path traversal failed
        class: HDF5
        major: Symbol table
        minor: Object not found

    error #003: ../../../src/H5Gtraverse.c in H5G__traverse_real(): line 738: component not found
        class: HDF5
        major: Symbol table
        minor: Object not found
x$exists(name)
! Closing all linking to H5 file: /workspaces/RCC/data/201.Download_sc/GSE159115/GSM4819725_SI_18854_filtered_gene_bc_matrices_h5.h5
An object of class liger with 0 cells
datasets(0):  ( cells) 
cellMeta(1): dataset 
varFeatures(0):  
dimReds(0):  

However, when I use the Read10X_h5 from Seurat with the same file, the data is loaded successfully:

Read10X_h5("data/201.Download_sc/GSE159115/GSM4819725_SI_18854_filtered_gene_bc_matrices_h5.h5")

Thanks for any help.

mvfki commented 4 months ago

Hi Harvey,

I had a look at the data by searching the GEO ID, and it would be an easy fix using the following command:

lig <- createLiger(
    list(name = "~/work/Welch_Lab/LIGER/GSM4819725_SI_18854_filtered_gene_bc_matrices_h5.h5"), 
    dataName = "GRCh38/data", 
    indicesName = "GRCh38/indices", 
    indptrName = "GRCh38/indptr", 
    genesName = "GRCh38/gene_names", 
    barcodesName = "GRCh38/barcodes"
)

A bit of explanation:

HDF5 format, an H5 file, can be roughly considered as a file containing internal folder-like structure with each array/scalar data stored with its path. In most of the example cases, 10X H5 file format put things in the following way:

/matrix/
    |-/matrix/data
    |-/matrix/indices
    |-/matrix/indptr
    |-/matrix/barcodes
    |-/matrix/features/
        |-/matrix/features/id
        |-/matrix/features/names
......etc

Here in your case, the /matrix folder is renamed as /GRCh38, probably due to some new CellRanger features. If you want to explore it on your own in case you again encounter this issue, I suggest opening the H5 file with package hdf5r first and make sure of paths of the 5 basic elements I showed above.

Of course, we'll try to optimize the importing workflow to minimize the mess from users' side, in future updates.

Best, Yichen LIGER Team

h4rvey-g commented 4 months ago

Thank you Yichen, that's very helpful!