shahcompbio / scgenome-deprecated

3 stars 4 forks source link

Issue with scgenome.snvphylo.snv_hierarchical_clustering_figure #35

Open nhuhoa opened 4 years ago

nhuhoa commented 4 years ago

Dear all,

I have tested scgenome.snvphylo.snv_hierarchical_clustering_figure function:

Moreover, I have an issue with matrix computation when I load too large combined data, MemoryError: Unable to allocate 953. GiB for an array with shape (127905439753,) and data type float64

I guess the error is at matrix computation seaborn.clustermap function:

KLUDGE: currently recursion in dendrograms

# breaks with large datasets
import sys
sys.setrecursionlimit(10000)

g = seaborn.clustermap(snv_presence_matrix, rasterized=True, row_cluster=True, figsize=(5, 12))

Do you have any idea how to fix this problem?

Thanks, Hoa Tran

nhuhoa commented 4 years ago

Dear all, I think I found the solution to this issue. The problem is at snv_matrix. my dataset: snv_matrix: 306334 16 But there are 306299 unique row values instead of 306334 rows in total, if using index = {'chrom', 'coord', 'cluster_id'}

So to fix the problem, I add the script below:

snv_matrix['desc'] = snv_matrix[['chrom', 'coord', 'clusterid']].apply(lambda x: ''.join(map(str, x)), axis=1) snv_matrix = snv_matrix.drop_duplicates(subset=['desc']) # from 306334 to 306299 rows in my case

Best, Hoa Tran