guanjue / IDEAS_2018

Jointly characterizing epigenetic dynamics across multiple cell types
MIT License
6 stars 6 forks source link

unexpected output for setting number of states #11

Open KailiBio opened 4 years ago

KailiBio commented 4 years ago

Hi Guanjue,

I was running IDEAS on 10 marks across 66 samples. I wanted to set the output states to 18. I have changed the "num_state= 18" in the parafile, but the output is >30 states. I rerun a few times and I get >18 states. I have done the same thing to set the state number to 40, which works. Do you know why? And do you have suggestions for this?

KailiBio commented 4 years ago

I have checked the log file. It did run _"ideas 66sample_10marks_DHS-center_bins_18.input.50 encode_v2_DHS-center_bins_bySpace.bed -impute none -norm -G 18 -C 100 -minerr 0.5 -cap 16 -sample 20 5 -thread 21 -o /data/zusers/fankaili/ideas/final/66sample_10marks_DHS-center_bins_18_result/66sample_10marks_DHS-center_bins18.tmp.50 -inv 8692139 9192139" at the beginning, and somehow in the middle change to "-G 39". didn't make sense to me, shouldn't it just keep with "-G 18"?

guanjue commented 4 years ago

This is because based of the IDEAS model, 18 state cannot fully explain the clusters in the data. One thing cause that may be some inflated background signals which can create some state with weak signals. We current set a new pipeline (https://github.com/guanjue/S3V2_IDEAS_ESMP) which has much better denoise preprocessing step which can reduce that issue.

For the this pipeline, one modification may circumvent this issue is to also set the start state number to 18 as well by "num_start= 18".

KailiBio commented 4 years ago

Thank you very much for the quick response. I will try both ways.