xiaoyeye / CNNC

covolutional neural network based coexpression analysis
MIT License
72 stars 23 forks source link

Can’t generate NEPDF data for the example sc-RNA seq data #15

Open JFF1594032292 opened 3 years ago

JFF1594032292 commented 3 years ago

Hello! I have a new problem: The get_xy_label_data_cnn_combine_from_database.py report an error when generate NEPDF data for the example sc-RNA seq h5 file python CNNC/get_xy_label_data_cnn_combine_from_database.py \ None \ CNNC/data/sc_gene_list.txt \ CNNC/data/dendritic_gene_pairs_200.txt \ CNNC/data/dendritic_gene_pairs_200_num.txt \ None \ CNNC/Data_sources/dendritic_cell.h5 \ 1. The dendritic_cell.h5 was downloaded from image Then it reported errors: image So I opened the h5 file by h5py and changed the key from "RPKMs" to "rpkm". However, it also reported errors: image It works well when I run this command for the bulk RNA seq data "mouse_bulk.h5", so I wonder if the sc-RNA seq data should be different from bulk data or some other reasons? Tanks a lot!

xiaoyeye commented 3 years ago

Hi, It seems that the problem lies in the store function. I wonder what if you change to rpkm = store['RPKMs']? Best

JFF1594032292 commented 3 years ago

I edited the "get_xy_label_data_cnn_combine_from_database.py", then it runs well on example sc-RNA seq data. However, when I use it to run on my own sc-RNA seq data, it reported the same error: image The sc-RNA seq .h5 file was generate from my own expression matrix with h5py module, and have the same structure with the example .h5 file. image

xiaoyeye commented 3 years ago

Well. It is hard for me to debug based on other datasets. Generally speaking, the "store" function is used to read the expression data as a pd.DataFrame format, so I believe any function that is able to achieve this can be used. Of course, please pay attention to its columns name which is gene symbol. hope it can help.

JFF1594032292 commented 3 years ago

Thanks! I generate the h5 file with pd.HDFStore and it runs well. Another question is about the get_xy_label_data_cnn_combine_from_database.py scripts: image I noticed the annotation and removed the "[0:43261]" as described, and I already know the 43261 means the number of samples in the sc data. However, I also noticed the "43261" with green box marked (130 and 136 lines) . So should I modify this number with my own sc-RNA data samples counts?

xiaoyeye commented 3 years ago

Great!

Yes, you can modify this number with your own data. Best