wangjr03 / FLAMINGO

MIT License
15 stars 10 forks source link

How to prepare the lookup_table #6

Open liminghao663 opened 2 years ago

liminghao663 commented 2 years ago

When I want to add the lookup_table in write.vtk, such as the compartment scores in bed format, I do not know how to prepare it.

I noticed that the row name of the results data frame is "frag_id",which should be caculated by chr_size/frag_res. But how to match the frag_id to the real genome location? Is there an easy way to get the mapping relationship?

Thanks a lot

JiaxinYangJX commented 2 years ago

Thanks for using our software!

To make the 3D genome visualization more informative, the lookup_table is used to assign a specific score for each fragment. The users can choose any kind of information to visualize based on their interests, such as compartment states. The chromatin states of the genome in different cell types can be obtained from Rao's experimental data (GSE63525). If the users don't really need any other information for each fragment, the lookup_table can be any random scores, for example, lookup_table=rep(1,dim(res)[1])

For any fragment with frag_id, the genome location will be [ frag_idfrag_res, (frag_id+1)frag_res ].

liminghao663 commented 2 years ago

Really thanks for your response. I have another question. When I run flamingo.main_func_large in a loop for each chr, I would frequently see the error about parallel thread.

The error occurs in the end of "Reconstructing intra-domain structures" and says that:

Error in checkForRemoteErrors(val) : one node produce an error: creation of server socket failed : port xxxx cannot be opened

This error might be relateted in the line19 - line27 in the FLAMINGO/FLAMINGOr/R/flamingo.reconstruct_structure.R but I do not known how to fix it.

When the error breaks the loop, I restart to run the same chr and it would be work mostly, but would break due to the error after several chrs.

I have tested this error for many times and noticed that the more nthread I used, the more frenquently I would meet the error. My working computer have 32 threads and 126Gb RAM, when I set nthread=30, the error occurred in about each two chrs, when I set nthread=20, the error occurred in about each seven chrs. In each chr run of the loop, I have used gc() and empted the working dictionary.

liminghao663 commented 2 years ago

I test the code on another server (96 thread, 1Tb RAM) and set nthread=20. The above error break runs in about 1/3 task.

liminghao663 commented 2 years ago

After additional tests, I found that this kind of error is also related to the frag_res parameter. If frag_res is set to larger value, such as 100e3, the chance of error would be much lower.

JiaxinYangJX commented 2 years ago

I think the errors are pretty much related to the multi-thread issues in parallel computing. More processors and more available memory may solve that problem. The demo data (4DNFI1UEG1HD.hic) can be successfully reconstructed at AMD EPYC 7H12 with 35 processors and 100GB of memory.

It seems you were using for loop to reconstruct many chromosomes. Could you also try to request enough processors and memory for one single chromosome? Let me know if you still meet the errors. Thanks!

haowang0508 commented 2 years ago

Hi,

Thanks for using our tool! I think this is related to the incorrect arrangement of the threads. I recommend running different chromosomes in different jobs if you have access to a server. If not, please close all threads after each iteration, this might help.

Best

JiaxinYangJX commented 1 year ago

Thanks for using our software! We are currently collecting all the feedback and reorganizing our tools. We will release a more user-friendly version by next month.

JiaxinYangJX commented 10 months ago

Hi @liminghao663 Thanks for using our tools! We just released a lite version of FLAMINGO, which is faster, more memory-efficient, and more user-friendly. We fixed the majority of bugs and hope the new version could help. Below is the link. https://github.com/JiaxinYangJX/FLAMINGOrLite