yezhengSTAT / scVI-3D

GNU General Public License v3.0
5 stars 2 forks source link

KeyError: 'chr21' #11

Open Accompany0313 opened 1 month ago

Accompany0313 commented 1 month ago

Hi Ye, Nice work!

I'm sorry to bother you again. Running scVI-3D is very important to me, so I am very sorry for disturbing you. I am able to run the demo data with no error. But when I executed the full hg38 Lee2019 dataset, the following error occurred. 1715419398749

I can't find the cause of the problem. Can you help me? Thank you very much!

My orders are as follows: python scripts/scVI-3D.py -b 10 -c "whole" -r 1000000 -i "Lee2019/1M/data" -o "Lee2019/1M/results" -cs "Lee2019/1M/Lee2019-summary.txt" -g "supplementaryData/hg38.chrom.sizes" -br -n 100 -gpu -p 48 -pca 50 -v

yezhengSTAT commented 1 month ago

Hello,

Sincerely apologize for my late reply!! I have just got out of a series of deadline dues. Sorry again for the troubles when running scVI-3D!

Glad to learn that the demo data worked just fine. It seems that "chr21" is not part of the key lists......By any chance, chr21 was lost from the genome size file? What if you only run on individual chromosome, -c "chr21"?

Actually, can I ask you for help? We also have other users having problems with the module version compatibility during the scVI-3D installation. Do you mind sharing your installation adjustment or the module version that you used, leading to a successful run of the demo data? Very much appreciate it!!

Thanks, Ye

Accompany0313 commented 1 month ago

Thank you very much for your reply. This problem is caused by several cell files missing chr21 in my dataset. The reason for the lack is that when I was processing data with a resolution of 1M, the length of chr21 in these files did not meet the processing requirements, so the data of chr21 was missing. When I used the same method for 100K of data, this problem did not occur. Here is my code when I processed the data:

a7f81b237047777a234d2206160d28d

If I can, I would be honored to help you in my ability. Here is the yml file for my conda environment. There are a few things that I think are important to go wrong when creating the environment. They are: scvi-tools==0.14.5,scikit-learn==0.24.2,torchmetrics==0.7.0. In the torchmetrics version in particular, I set it very high at first, and then there were class inheritance issues. I have not encountered other problems for the time being. I hope it can help you. environment.txt

yezhengSTAT commented 1 month ago

I see. Indeed, those small and relatively more sparse chromosomes tend to be more problematic. Sometimes, I use fewer bands on those chromosomes for more reliable runs and results.

Thanks a lot for sharing your invaluable experience of installing the dependency packages!

Thanks, Ye

Accompany0313 commented 1 month ago

I want to test when scVI-3D works best, and which of its input parameters have a greater impact on the results.

Thanks!

yezhengSTAT commented 1 month ago

Emm......resolution and number of bands used? I think it depends on the characteristics of the input data.

Accompany0313 commented 1 month ago

是的,这个确实需要根据输入数据的特征来确定。我看源码中,我感觉是不是nLatent, poolStrategy, includeDiag, verbose,这些参数对结果也会有细微的影响,如果我想算准确的ARI的话。

请原谅我用中文回复您,因为我认为这样说的更准确,谢谢!

yezhengSTAT commented 1 month ago

Verbose is just used for deciding whether or not to print information. nLatent, poolStrategy and includeDiag will affect the downstream results. The impact level depends on the input data. The paper's supplementary figures have some discussions on those parameters. Those maybe helpful for you to have an idea about how much and in which direction the impact can be.

Accompany0313 commented 1 month ago

Ok, thank you very much. Have a nice day at work!