tzhu-bio / cisDynet

An integrated platform for modeling gene-regulatory dynamics and networks
MIT License
28 stars 3 forks source link

Error: "Duplicate 'row.names' are not allowed" when using getPeak2Gene function #13

Closed Yuta2408 closed 3 weeks ago

Yuta2408 commented 1 month ago

I'm encountering an error while trying to run the getPeak2Gene function to analyze ATAC-seq and RNA-seq data. The error says that "Duplicate 'row.names' are not allowed." It seems like the issue arises from duplicate row names (gene symbols) in the RNA matrix after reading the data. I would appreciate any suggestions for fixing this issue or handling duplicate row names in a better way. Environment:

R version: 4.3.1

Code Example:

Here's the code I'm running: p2g_res <- getPeak2Gene( atac_matrix = "./ATAC_CPM_Norm_Data.tsv", rna_matrix = "./RNA_TPM_Norm_Data.tsv", peak_annotation = anno, max_distance = 50000, N_permutation = 10000, save_path = "./cisDynet_result" )

Error Message: 2024-10-10 15:11:27 Remove the gene with all expression value is 0. .rowNamesDF<-`(x, value = value) error: 'row.names' must be numeric Additional warning message: non-unique values when setting 'row.names': '0610010B08Rik', '0610010F05Rik', '0610010K14Rik', '1-Mar', '1-Sep', '10-Mar', '10-Sep', ...

What I Have Tried:

I tried using unique() to remove duplicate row names, but that causes gene information to be lost.
I also considered using make.unique() to add unique identifiers to the row names, but I prefer not to alter the gene names in this way as it may impact the downstream analysis.

What I Want to Solve:

How can I resolve the duplicate row names issue so that getPeak2Gene works correctly?
What is the best practice for handling duplicate gene symbols in RNA matrices?
Are there other recommended ways to ensure that the row names (gene symbols) in my matrix are unique without losing critical information?

Additional Information:

The RNA matrix is correctly read into R, but the issue arises because of duplicate row names (gene symbols).
The files I'm working with are .tsv files, and the RNA-seq data contains TPM normalized expression values.