write anndata failed, pearson_residuals_df header message is too large

brainfo commented 1 year ago

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of scanpy.
[ ] (optional) I have confirmed this bug exists on the master branch of scanpy.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Minimal code sample (that we can copy&paste without having any data)

Write any anndata with pearson residuals in uns

ad_all.write(filename='output/10x_h5/ad_all_2cello.h5ad')

The pearson_residual_df looks like this, with 38291 rows (obs) and 5000 columns (features) :

{'theta': 100,
 'clip': None,
 'computed_on': 'adata.X',
 'pearson_residuals_df': gene_name                             A2M  AADACL2-AS1      AAK1     ABCA1  \
 barcode                                                                      
 GAACGTTCACACCGAC-1-placenta_81  -1.125285    -1.159130 -3.921314 -2.533474   
 TATACCTGTTAGCTAC-1-placenta_81  -1.091364     3.267127 -1.806667 -2.109586   
 CTCAAGAGTGACTGTT-1-placenta_81  -1.074943    12.272920 -1.948798 -2.735791   
 TTCATTGTCACGAACT-1-placenta_81  -1.098699    -1.131765  3.481171  4.472371   
 TATCAGGCAGCTCATA-1-placenta_81  -1.107734    -1.141064 -0.571775 -2.813671   
 ...                                   ...          ...       ...       ...   
 CACAACATCGGCGATC-1-placenta_314 -0.115585    -0.119107 -0.434686 -0.303945   
 AGCCAGCGTGCCCAGT-1-placenta_314 -0.097424    -0.100394 -0.366482 -0.256219   
 CCGGTGAGTGTTCGAT-1-placenta_314 -0.110334    -0.113696 -0.414971 -0.290148   
 AGGTCATAGCCTGACC-1-placenta_314 -0.115585    -0.119107 -0.434686 -0.303945   
 TTTATGCCAAAGGGTC-1-placenta_314 -0.112876    -0.116316 -0.424515 -0.296827

Unable to create attribute (object header message is too large)

Above error raised while writing key 'pearson_residuals_df' of <class 'h5py._hl.group.Group'> to /

Versions

scanpy==1.9.1 anndata==0.8.0 umap==0.5.2 numpy==1.21.5 scipy==1.8.0 pandas==1.4.1 scikit-learn==1.0.2 statsmodels==0.13.2 python-igraph==0.9.9 pynndescent==0.5.6

Zethson commented 1 year ago

@jlause think this could interest you?

brainfo commented 1 year ago

已收到，谢谢。

Zethson commented 1 year ago

@brainfo I'm afraid that we only speak English.

evanbiederstedt commented 1 year ago

@Zethson I suspect it's an automatic response via an email service in China. "Received, thanks"

brainfo commented 1 year ago

@brainfo I'm afraid that we only speak English.

Hi sorry and indeed that's an automatic reply from the email service I just changed the email address so this would not be a problem later.

To add one more problem encountered when writing out anndata object: When the anndata has 'predicted_doublet' in obs annotation from sc.external.pp.scrublet, the boolean values could not be implicitly converted to strings so that gives errors:

TypeError: Can't implicitly convert non-string objects to strings

Above error raised while writing key 'predicted_doublet' of <class 'h5py._hl.group.Group'> to /

Users can map the values like this but would it be better to have the implicit conversion while read/write anndatas?

anndata.obs['predicted_doublet'] = anndata.obs['predicted_doublet'].map({True: 'True', False: 'False'})

scverse / scanpy

write anndata failed, pearson_residuals_df header message is too large #2383

Minimal code sample (that we can copy&paste without having any data)

Versions