Inquiry about A and B compartment identification at different resolutions

hbandukw commented 3 years ago

Hello,

I identified A and B compartments in my data at three resolutions (10Kb, 100Kb and 1Mb):

''' fanc compartments -g $CHROMS_PATH -d $DOMAIN_OUTPUT $FILE -x $CHROMS2EXCLUDE '''

I am a bit confused about the results.

siControl_1_ABcompartments

At this locus, the compartments at 10kb are identified as B, A, B, A, B, A and identified as just "A" at 100Kb and 1Mb. When I look at the 10kb resolution, it seems like most of the region is "B" so why is it being called as "A" at the lower resolutions? This is happening at most loci e.g. in the figure below, FanC identifies the region as A at 10kb but B at 100 kb and 1Mb.

siControl_1_TADs_Smarca4

Is this normal or not?

kaukrise commented 3 years ago

Hi, thanks for the question.

There are several factors at play here when working with AB calls at different resolutions (which span three orders of magnitude in your case):

At high resolutions noise plays a large role - compartment calls are generally much less rust at 10kb than at 1mb. Something we have observed in a lot of datasets is that entries in the correlation matrix tend to be positive in very noisy matrices, which may lead to different compartment calls observed in your data
Even if your data were of sufficiently high resolution that 10kb compartment calls would be reasonably robust, it is likely that you will observe substantial local differences in compartment calls, simply because you integrate a lot more data per region at lower resolutions than at high ones

In your case, the plots you show give the strong impression that your data does not support 10kb resolution AB calls, and I would be careful with 100kb calls, too. Keep in mind that AB calls are calculated on the whole chromosome matrix, and not just the entries close to the diagonal where the signal looks sufficient. Off the diagonal the signal can appear almost random in high resolution matrices.

My recommendation is to focus on plots of the AB correlation matrix and its eigenvector (EV), instead of just the high-level AB calls. The EV will give you a much better idea of fluctuations in and strength of your AB calls.

hbandukw commented 3 years ago

So basically, I was hoping to track compartment switches between my Control and Ko samples. So when I have plots for AB correlation matrix for my samples (see below), It is really hard to track any differences visually?

Ko Vs

Ctrl

Can you suggest what my options are?

kaukrise commented 3 years ago

I agree that these are difficult to quantify and also to assess visually. This, and the above noise considerations have kept me from using them in my own research.

What I have seen people do is to calculate the difference of the eigenvectors of the two samples and plot that in addition to the data you plotted above. But honestly I am not sure whether that is a mathematically valid approach or makes sense for your samples.

hbandukw commented 3 years ago

hmm I see. Thank you for the advice!

liz-is commented 3 years ago

Chiming in because I've spent a bunch of time thinking about compartments - I completely agree with Kai that compartment calls are much less robust at high resolutions and that inspecting the actual eigenvectors is important.

One approach that I've found helpful is to convert the eigenvector BED-format file from FAN-C into a bigwig file, so you can then inspect the eigenvectors in your favourite genome browser where you can zoom in/out, load multiple samples to compare, etc. You can also plot smaller regions with fancplot (in the same way as you plotted whole chromosomes above) if you want to then cross-reference with the correlation matrix.

However I would definitely first check that the eigenvectors really seem to reflect compartmentalisation, and not chromosomal position / chromosomal arm. Depending on species and resolution, it may be that the second eigenvector better reflects compartmentalisation. I've also found that assigning the sign of the eigenvector according to GC content can sometimes not give consistent assignments across resolutions / samples, so it's worth checking this too. You can then re-assign the sign of the eigenvector using other data if necessary (e.g. histone modifications, gene density, etc).

kaukrise commented 3 years ago

Hi @liz-is, thanks for chiming in! 100% agree, especially with the point about GC content.

hbandukw commented 3 years ago

Hi @liz-is, thanks for all the useful info.

So just one more thing, when checking the eigenvectors, should I even bother to look at any resolutions other than 1Mb?

liz-is commented 3 years ago

Impossible to say without knowing more about your data. 10 kb resolution is unlikely to give sensible compartments unless you have extremely deep sequencing, IMO, but 100 kb could be fine. If you have already calculated the eigenvectors I'd say you might as well look at them!

hbandukw commented 3 years ago

Hi @liz-is and @kaukrise , I was looking to get some advice on whether the assigned eigenvectors (@ 100kb and 1Mb) are correctly reflecting compartmentalization.

1) Chr12 @ 1Mb 12_siControl-C2C12_1_1mb ab_and_ev

2) Chr12 @ 100kb 12_siControl-C2C12_1_100kb_ab_and_ev

Am I correct to think that the eigenvectors are corresponding to compartmentalization?

liz-is commented 3 years ago

Yeah, they look good to me!

hbandukw commented 3 years ago

Hello,

I converted the domain bed files to bigwigs as suggested and was viewing them alongside some ChIPseqs of relevant histone marks in mice. I am having some strange things happen:

1) My Replicates are displaying mirror images of eigenvectors at some loci (e.g. screen-shot 1) and not at others (e.g. screen-shot 2 and 3)

Plot 1

Plot 2

Plot 3

Do you know why this is happening?

liz-is commented 3 years ago

As I mentioned above, assigning the sign of the eigenvector using GC content doesn't always give robust results across samples. Compartment identification is done per-chromosome, so it's possible for the assignments to be consistent on one chromosome but not on another. I suspect that's what's happening here. Also, in Plot 3, even though the replicates are consistent with each other, it seems unlikely that a KO would cause almost a complete switch in compartments, so one of the conditions there is probably also assigned incorrectly.

You can use your histone ChIP-seq data (or gene density, but histone data is probably better) to reassign the sign of the eigenvector for each chromosome and I expect you'll see much more consistent profiles.

kaukrise commented 3 years ago

In the time it took me to boot up my computer and sign in @liz-is beat me to it. :)

I'll just quote my previous post then, which still fits perfectly:

Hi @liz-is, thanks for chiming in! 100% agree, especially with the point about GC content.

hbandukw commented 3 years ago

Hi @liz-is and @kaukrise,

Ah ok! So I have access to public histone ChIPseq data for my control samples (i.e. similar cells + condition) but not from cells that have the specific KO that I have. I am assuming that many people have this problem? what do they do? If I don't have appropriate histone data for my KO, can I do anything else to orient the eigenvector signs?

Thanks again for all your great advice and help!!

liz-is commented 3 years ago

I'm not sure what other people do to be honest, it seems no one talks much about this issue! What I've done in the past when I had a lot of conditions to compare was to take a control condition that had strong compartmentalisation and good sequencing depth, assign the sign of that (based on gene density in this case, then validated by comparison to chromatin and gene expression data), and assign all the others based on what correlated best to that reference eigenvector (code here if you are interested). This is all based on the assumption that the majority of regions won't have changes in compartmentalisation, of course. This approach usually works pretty well for me, but if you have reason to think that your KO will cause large portions of the genome to change compartment, then that gets very tricky.

hbandukw commented 3 years ago

Hi @liz-is, sorry about that late reply. I will give what you said a shot. Thanks again for your help.

hbandukw commented 3 years ago

Hi @liz-is, I am attempting to assign the eigenvector sign based on some histone data. I was wondering if I need to compare the eigenvectors per chromosome or not?

liz-is commented 3 years ago

I would definitely recommend doing the assignment per chromosome, as the eigenvectors are calculated per-chromosome (FAN-C also assigns the eigenvector sign per-chromosome internally). Whether the assigned sign matches the histone data etc can therefore vary across different chromosomes, so you need to assess each one individually to see if it needs to be flipped.

hbandukw commented 3 years ago

Sorry I need a little clarification: When you say "flip", do you mean that I would be flipping the sign of the eigenvector?

So basically let's say I am going through Chromosome 1 of my control file and comparing it to permissive and repressive histone marks. I see that the neg-eigenvectors are correlating with the permissive mark, so then I say that neg-eigenvector == A/active compartment and pos-eigenvector == B compartment. Then I move on to Chromosome 2 and do the same check but this time, I see that pos-eigenvector == A compartment so for chrome B, I choose this assignment and so on.

Is that how it would work? Thanks again for all your help and feedback!

liz-is commented 3 years ago

Yes, exactly! If you see that regions with a negative eigenvector value have high levels of permissive histone marks and low levels of repressive histone marks, that would indicate that the sign of the eigenvector has been assigned incorrectly. I would then multiply the eigenvector values for that chromosome by -1 to "flip" the eigenvector and use these "corrected" values for downstream analysis such as defining A and B compartment regions.

Does that make sense?

hbandukw commented 3 years ago

Ok so basically we are defining this before analysis: "Active (A) compartments = positive eigenvector values" and "Inactive (B) compartments = negative eigenvector values" ...and then as I assess regions per chromosome, If I see a negative eigenvector have high levels of permissive histone marks and low levels of repressive histone mark, then I am will flip the sign of the eigenvectors in that chrom-set so it corresponds to out definition, correct??

liz-is commented 3 years ago

Yes, the convention established in the first Hi-C papers is that "positive eigenvector values = A compartment" and "negative eigenvector values = B compartment". I find it convenient to flip the sign of the eigenvector for each chromosome to align with this convention, as it makes it easier to then use them downstream for plotting as genomic tracks, making saddle plots, etc. Of course if you're not using the eigenvector values themselves downstream, only the A and B compartment regions, then you can assign the A / B regions directly and not flip the eigenvector values. What you describe above is how I would do it, though.

hbandukw commented 3 years ago

Ah ok! I get it now! Thanks!!!

hbandukw commented 3 years ago

Hello @liz-is, in your code, when compare the fraction of A/B-compartments that overlap with "active chromatin" and "inactive chromatin" (overlap / total_size) in your function "check_chromatin_colours" - (https://github.com/vaquerizaslab/IngSimmons_et_al_dorsoventral_3D_genome/blob/main/scripts/compartment_analysis.Rmd) -, what do you do if these proportions are very similar?

E.g. For Chr2, when I compare the fraction of A/B compartments overlapping with H3K4me3 (active) and H3K27me3 (repressive) peaks, the proportions come out to be: 1) A_H3K4me3 vs A_H3K27me3 3822735 / 76113205 vs 4935985 / 76113205 0.05022433360939143 vs 0.06485057356341781

2) B_H3K4me3 vs B_H3K27me3 1448829 / 105999982 vs 2717943 / 105999982 0.013668200434222715 vs 0.025640976052241218

What do you do when the proportions are this similar?

liz-is commented 3 years ago

I was on holiday so maybe you have already decided what to do, but I would say that you really have to interpret these based on the context and other data that you have. For example, the chromatin colours data for Drosophila contains information on >2 chromatin states, so a chromosome may not have much H3K27me3-type heterochromatin but the B compartment may be enriched for H3K9me3-type heterochromatin. If looking at one data type doesn't resolve which compartment is A and which is B, I would try to make a consensus across multiple data types (gene density, GC content, H3K27me3, H3K4me3, H3K9me3, etc).

DittmanC commented 2 years ago

similar situation here, does the fanc tool or any tools suggested for me to flip the compartment and so that i can visualise in fanc compartments? I saw the commend only allows me to use --genome based on average GC content, but not allowing me to use my histone marks as the reference

liz-is commented 2 years ago

In my code that's linked above there's code to flip based on correlation with any data of interest. I asked once about incorporating something similar in FAN-C directly (#52), but it sounded like it would be tricky to make this computationally efficient.

vaquerizaslab / fanc

Inquiry about A and B compartment identification at different resolutions #67