quadbio / VoxHunt

VoxHunt: Resolving human brain organoid heterogeneity through single-cell genomic comparison to spatial brain maps
https://quadbio.github.io/VoxHunt/
MIT License
39 stars 5 forks source link

Error using load_mousebrain_data #8

Closed finjen closed 3 years ago

finjen commented 3 years ago

Hi, I'm having troubles loading the La Manno dataset using the function load_mousebrain_data. The error I am receiving is the following:

load_mousebrain_data('~/dev_all.agg_new.loom') Error in H5File.open(filename, mode, file_create_pl, file_access_pl) : HDF5-API Errors: error #000: ../../../src/H5F.c in H5Fcreate(): line 444: unable to create file class: HDF5 major: File accessibilty minor: Unable to open file

error #001: ../../../src/H5Fint.c in H5F__create(): line 1364: unable to open file
    class: HDF5
    major: File accessibilty
    minor: Unable to open file

error #002: ../../../src/H5Fint.c in H5F_open(): line 1557: unable to open file: time = Thu Feb 25 13:24:58 2021

, name = '/home/administrator/dev_all.agg_new.loom/dev_all.loom', tent_flags = 13 class: HDF5 major: File accessibilty minor: Unable to open file

error #003: ../../../src/H5FD.c in H5FD_open(): line 734: open failed
    class: HDF5
    major: Virtual File Layer
    minor: Unable to initialize object

error #004: ../../../src/H5FDsec2.c in H5FD_sec2_open(): line 346: unable to open file: name = '/home/administrator/dev_all.agg_new.loom/dev_all.loom', errno = 2

However, if I connect to the file using the LoomR connect function, there seems to be no problem. Can you please help me identifying the issue here?

Best

joschif commented 3 years ago

I believe the problem is that you are pointing load_mousebrain_data() to the file dev_all.agg_new.loom instead of the directory it is in. The function needs both files (dev_all.agg_new.loom, dev_all.loom) to load the data so if you specify the directory they are in it should work. So if they are in your home directory: load_mousebrain_data('~/')

finjen commented 3 years ago

I see. I am pointing now to the directory that contains both files, however, I am facing now this error:

Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared object '/home/administrator/R/x86_64-pc-linux-gnu-library/4.0/hdf5r/libs/hdf5r.so': libhdf5_hl.so.100: cannot open shared object file: No such file or directory

I have hdf5r installed and loaded. Not sure why this is happening.

joschif commented 3 years ago

There seems to be somethign wrong with your hdf5r, can you use it manually, e.g. by loading the loom file with it? loom_file <- hdf5r::H5File$new('~/dev_all.agg_new.loom')

finjen commented 3 years ago

loom_file <- hdf5r::H5File$new("~/dev_all.agg.loom") works fine actually.

joschif commented 3 years ago

Interesting, and can you also obtain data from the file like so genes <- loom_file[['row_attrs/Gene']][]. The function load_mousebrain_data() actually does not do much more than these two steps, so I'm not quite sure why it would not work in this case.

finjen commented 3 years ago

Ah.. no actually when I try to do this, it gives me this error again:

genes <-loom_file[['row_attrs/Gene']][] Error in x$exists(name) : HDF5-API Errors: error #000: ../../../src/H5L.c in H5Lexists(): line 815: unable to get link info class: HDF5 major: Links minor: Can't get value

error #001: ../../../src/H5L.c in H5L__exists(): line 3095: path doesn't exist
    class: HDF5
    major: Links
    minor: Object already exists

error #002: ../../../src/H5Gtraverse.c in H5G_traverse(): line 851: internal path traversal failed
    class: HDF5
    major: Symbol table
    minor: Object not found

error #003: ../../../src/H5Gtraverse.c in H5G__traverse_real(): line 741: component not found
    class: HDF5
    major: Symbol table
    minor: Object not found

So, something is going wrong with my hdf5r package I assume. That's weird. Do you have any idea what could be the source issue here? I never had any issues like that actually. Or is there maybe any other way of loading the mousebrain data for further functions to be able to detect it (instead of load_mousebrain_data)?

joschif commented 3 years ago

Just out of curiosity, can you list the contents of the file with loom_file$ls(recursive=T) and if that works, post the output? Otherwise maybe reinstalling hdf5r helps or if you can load the file with another pacakge like loomR we might also be able to get this to work with VoxHunt.

finjen commented 3 years ago

This seems to work, this is the output:

loom_file$ls(recursive = T) [1] name link.type obj_type num_attrs group.nlinks
[6] group.mounted dataset.rank dataset.dims dataset.maxdims dataset.type_class [11] dataset.space_class committed_type

<0 rows> (or 0-length row.names) Reinstalling the hdf5r package was successful, however load_mousebrain_data still results in the same error. And I had tried before (however, maybe not correctly) to load the data via loomR but using the following function I got the error below: mb_map <- mousebrain_map(mydata, group_name = 'leiden_cluster', genes_use = regional_markers) Error in mousebrain_map.default(object = Matrix::Matrix(expr_mat, sparse = T), : Data has not been loaded. Please run load_mousebrain_data() first.
joschif commented 3 years ago

Are you sure that the loom file is correct? To me the output suggests that the file is empty. For the expected file, the output shoudl be quite a large dataframe. E.g. for me the output of loom_file$ls(), with loom_file being dev_all.agg.loom looks like this

        name     link.type    obj_type num_attrs group.nlinks group.mounted dataset.rank dataset.dims
1      attrs H5L_TYPE_HARD   H5I_GROUP         0            4             0           NA         <NA>
2  col_attrs H5L_TYPE_HARD   H5I_GROUP         1          135             0           NA         <NA>
3 col_graphs H5L_TYPE_HARD   H5I_GROUP         0            0             0           NA         <NA>
4     layers H5L_TYPE_HARD   H5I_GROUP         0            2             0           NA         <NA>
5     matrix H5L_TYPE_HARD H5I_DATASET         0           NA            NA            2  942 x 31053
6  row_attrs H5L_TYPE_HARD   H5I_GROUP         1           12             0           NA         <NA>
7 row_graphs H5L_TYPE_HARD   H5I_GROUP         0            0             0           NA         <NA>
  dataset.maxdims dataset.type_class dataset.space_class committed_type
1            <NA>               <NA>                <NA>           <NA>
2            <NA>               <NA>                <NA>           <NA>
3            <NA>               <NA>                <NA>           <NA>
4            <NA>               <NA>                <NA>           <NA>
5     Inf x 31053          H5T_FLOAT          H5S_SIMPLE           <NA>
6            <NA>               <NA>                <NA>           <NA>
7            <NA>               <NA>                <NA>           <NA>
finjen commented 3 years ago

Sorry, indeed the file I had tried loading was damaged. I re-downloaded the loom files now again and tried loom_file <- hdf5r::H5File$new("~/Documents/Signatures/LaManno_Dev/dev_all.agg_new.loom") which, again, worked without error and loom_file$ls() results in this as well: name link.type obj_type num_attrs group.nlinks group.mounted dataset.rank dataset.dims 1 attrs H5L_TYPE_HARD H5I_GROUP 0 4 0 NA 2 col_attrs H5L_TYPE_HARD H5I_GROUP 1 137 0 NA 3 col_graphs H5L_TYPE_HARD H5I_GROUP 0 0 0 NA 4 layers H5L_TYPE_HARD H5I_GROUP 0 2 0 NA 5 matrix H5L_TYPE_HARD H5I_DATASET 0 NA NA 2 798 x 31053 6 row_attrs H5L_TYPE_HARD H5I_GROUP 1 12 0 NA 7 row_graphs H5L_TYPE_HARD H5I_GROUP 0 0 0 NA dataset.maxdims dataset.type_class dataset.space_class committed_type 1 2 3 4 5 Inf x 31053 H5T_FLOAT H5S_SIMPLE 6 7

This works for both of the files. But with load_mousebrain_data, the output is still the same error.

joschif commented 3 years ago

I'm sorry you are experiencing these errors, and I still haven't figured out what the problem could be. Since the function is quite short, maybe you can try to run the function line by line manually and see whether that works. Here's the function definition: https://github.com/quadbiolab/VoxHunt/blob/9ddcfcf7b0b6a1a1725f4f111fe41b0b6ef5553a/R/utils.R#L65

finjen commented 3 years ago

No worries. I am running the function through, and it is all fine until running the lines: MOUSEBRAIN_DATA <<- list( matrix = Matrix(agg_expression, sparse=T), meta = all_meta ) Out: Error in Matrix(agg_expression, sparse = T) : could not find function "Matrix"

I downloaded then the package Matrix and ran it again, then it went through without error. Also, the function load_mousebrain_data() works now without error. However, when I check agg_loom, the output is this: Class: H5File ID: Object invalid Not sure this is ok?

Also, when I run mb_map <- mousebrain_map(mydata, group_name = 'leiden_cluster', genes_use = regional_markers)

I do get the error: Error: Must subset rows with a valid subscript vector. ℹ Logical subscripts must match the size of the indexed input. x Input has size 49073 but subscript !duplicated(x, fromLast = fromLast, ...) has size 0.

Since it is the same than when running voxel_map, I guess loading the mouse brain data has worked now (?) but there is still something off with my data. How do you see it?

joschif commented 3 years ago

What's in agg_loom isn't really important, what counts is that MOUSEBRAIN_DATA contains an expression matrix and metadata. Can you maybe run rlang::last_error() again to get the backtrace?

finjen commented 3 years ago

I see. There is a spare matrix and metadata in MOUSEBRAIN_DATA.

This is the output of rlang::last_error():

rlang::last_error() <error/vctrs_error_subscript_size> Must subset rows with a valid subscript vector. ℹ Logical subscripts must match the size of the indexed input. x Input has size 49073 but subscript !duplicated(x, fromLast = fromLast, ...) has size 0. Backtrace:

  1. voxhunt::voxel_map(...)
  2. voxhunt:::voxel_map.default(...)
  3. generics:::intersect.default(inter_genes, genes_use)
  4. base::intersect(x, y, ...)
  5. base::unique.data.frame(y[match(as.vector(x), y, 0L)])
  6. tibble:::[.tbl_df(...)
  7. tibble:::tbl_subset_row(xo, i = i, i_arg)
  8. tibble:::vectbl_as_row_index(i, x, i_arg)
  9. tibble:::vectbl_as_row_location(i, nr, i_arg, assign)
  10. vctrs::vec_as_location(i, n)
  11. vctrs:::stop_indicator_size(...)

And:

rlang::last_trace() <error/vctrs_error_subscript_size> Must subset rows with a valid subscript vector. ℹ Logical subscripts must match the size of the indexed input. x Input has size 49073 but subscript !duplicated(x, fromLast = fromLast, ...) has size 0. Backtrace: █

  1. ├─voxhunt::voxel_map(...)
  2. └─voxhunt:::voxel_map.default(...)
  3. ├─generics::intersect(inter_genes, genes_use)
  4. └─generics:::intersect.default(inter_genes, genes_use)
  5. └─base::intersect(x, y, ...)
  6. ├─base::unique(y[match(as.vector(x), y, 0L)])
  7. └─base::unique.data.frame(y[match(as.vector(x), y, 0L)])
  8. ├─x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE]
  9. └─tibble:::[.tbl_df(...)
    1. └─tibble:::tbl_subset_row(xo, i = i, i_arg)
    2. └─tibble:::vectbl_as_row_index(i, x, i_arg)
    3. └─tibble:::vectbl_as_row_location(i, nr, i_arg, assign)
    4. ├─tibble:::subclass_row_index_errors(...)
    5. │ └─base::withCallingHandlers(...)
    6. └─vctrs::vec_as_location(i, n)
    7. └─(function () ...
    8. └─vctrs:::stop_indicator_size(...)