Open nhuhoa opened 4 years ago
Dear all, I think I found the solution to this issue. The problem is at snv_matrix. my dataset: snv_matrix: 306334 16 But there are 306299 unique row values instead of 306334 rows in total, if using index = {'chrom', 'coord', 'cluster_id'}
So to fix the problem, I add the script below:
snv_matrix['desc'] = snv_matrix[['chrom', 'coord', 'clusterid']].apply(lambda x: ''.join(map(str, x)), axis=1) snv_matrix = snv_matrix.drop_duplicates(subset=['desc']) # from 306334 to 306299 rows in my case
Best, Hoa Tran
Dear all,
I have tested scgenome.snvphylo.snv_hierarchical_clustering_figure function:
Function work well with 1 library data (snv_data, snv_count_data) as input.
But if I combine many libraries (here 25 libraries), this function throw an error: "Index contains duplicate entries, cannot reshape" snv_presence_matrix = snv_matrix.set_index(['chrom', 'coord', 'cluster_id'])['is_present'].unstack(fill_value=0) So I guess because of combination of many data, there are some duplicated values in 'chrom', 'coord', 'cluster_id'. Do you have any idea how to fix this problem?
Moreover, I have an issue with matrix computation when I load too large combined data, MemoryError: Unable to allocate 953. GiB for an array with shape (127905439753,) and data type float64
I guess the error is at matrix computation seaborn.clustermap function:
KLUDGE: currently recursion in dendrograms
g = seaborn.clustermap(snv_presence_matrix, rasterized=True, row_cluster=True, figsize=(5, 12))
Do you have any idea how to fix this problem?
Thanks, Hoa Tran