jernst98 / ChromHMM

GNU General Public License v3.0
71 stars 18 forks source link

On this data the INFORMATION initialization strategy can only support 4 states #59

Closed LHXqwq closed 3 days ago

LHXqwq commented 1 month ago

When I run the following command:

java -mx1600M -jar ChromHMM.jar LearnModel -b 200 Input_FC2 Output_FC2_10 10 hg19

I encountered the following error. What could be the potential issue?

Exception in thread "main" java.lang.IllegalArgumentException: On this data the INFORMATION initialization strategy can only support 4 states. Check if the binarization was done correctly, and if so use the RANDOM or LOAD options for more states at edu.mit.compbio.ChromHMM.ChromHMM.informationInitializeNested(ChromHMM.java:2675) at edu.mit.compbio.ChromHMM.ChromHMM.buildModel(ChromHMM.java:1036) at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:15223)

jernst98 commented 1 month ago

Do you only have 2 input marks? The default initialization mode can only handle 2^(number of input mark) states but you are asking for 10 states

LHXqwq commented 1 month ago

Yes, I currently only have data for H3K4me3 and H3K27ac. Can ChromHMM be used to roughly identify enhancers and promoters based solely on these two datasets?

jernst98 commented 1 month ago

Yes. You would need to either use at most 4 states or if using more than 4 states use the '-init random' flag

LHXqwq commented 1 month ago

When I run java -mx1600M -jar ChromHMM.jar LearnModel -b 200 Input_FC2 Output_FC2_4 4 ss11, the following error occurs:

Writing to file Output_FC2_4_2/ss11_4_segments.bed Writing to file Output_FC2_4_2/ss11_4_dense.bed Writing to file Output_FC2_4_2/ss11_4_expanded.bed Writing to file Output_FC2_4_2/ss11_4_overlap.txt Writing to file Output_FC2_4_2/ss11_4_overlap.png Writing to file Output_FC2_4_2/ss11_4_overlap.svg Exception in thread "main" java.lang.NumberFormatException: For input string: "-" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:642) at java.base/java.lang.Integer.parseInt(Integer.java:770) at edu.mit.compbio.ChromHMM.StateAnalysis.neighborhoodMax(StateAnalysis.java:2942) at edu.mit.compbio.ChromHMM.ChromHMM.main(ChromHMM.java:15459)

What could be the cause? I did not find the '-' character in the files under the Input_FC2 folder.

jernst98 commented 1 month ago

I think there is a '-' in a file in the ss11 folder of ANCHORFILES. Did you check there?

LHXqwq commented 1 month ago

Thank you very much. The issue was with a file in the ss11 folder of ANCHORFILES where there was a space in the second column. I have now successfully run the process and obtained the corresponding results.

The contents of emissions_4.txt are as follows:: State (Emission order) H3K27ac H3K4me3 1 0.7371722491995913 0.025454951706880308 2 0.0070734777583484985 2.808219552645712E-4 3 0.9531553476199173 0.9489478579210275 4 0.0526273184047077 0.8673363187489265

In the ss11_4_expanded.bed file, I noticed the following correspondence between state and color: State color 1 0,0,255 2 0,102,0 3 51,255,153 4 255,255,0

Could you please tell me which of the 15 states these 4 states correspond to? For example (Active Promoter, Weak Promoter, Strong Enhancer, Weak/Poised Enhancer, etc.).

jernst98 commented 1 month ago

Have you run enrichments for annotated TSS/promoter regions?

LHXqwq commented 1 month ago

Yes, the content of ss11_4_overlap.txt is as follows:

State (Emission order) Genome % RefSeqExon.ss11.bed.gz RefSeqGene.ss11.bed.gz RefSeqTES.ss11.bed.gz RefSeqTSS.ss11.bed.gz RefSeqTSS2kb.ss11.bed.gz 1 2.03495 5.15444 2.27030 3.58753 22.34702 15.81526 2 90.84071 0.81031 0.90967 0.85598 0.37223 0.45504 3 1.71311 3.78950 1.71852 3.00031 10.22387 12.69401 4 5.41123 1.73893 1.81115 1.81144 0.59076 0.87486 Base 100 0.3060868984 5.6918859806 0.0001805545 0.0001824023 0.7014005474

I am working on enhancer identification for the first time and am not quite sure how this information can help identify enhancers.

jernst98 commented 1 month ago

I find it odd that you are not seeing TSS enrichment in state 4 associated with H3K4me3. You might want to double check the emission parameters correspond to the final version of the model the enrichments are based on and/or your H3K4me3 data is ok.