vaquerizaslab / fanc

FAN-C: Framework for the ANalysis of C-like data
GNU General Public License v3.0
104 stars 14 forks source link

Statistical methods employed comparing insulation scores #190

Open luzporras opened 2 months ago

luzporras commented 2 months ago

I'm comparing insulation scores of two samples using FANC. I've created two .bed files—one through FANC compare and the other via fanc.DifferenceRegions.from_regions, both using a 50kb bin and a 150kb window. However, I've observed discrepancies between the outputs of these bed files, leaving me unsure of which one to utilize.

My main questions revolve around the statistical methods employed in generating these outputs. Specifically, I want to know what statistical analyses underlie the calculations used to generate these outputs and what the criteria are for determining the significance of differences between the insulation scores of the two samples.

Thanks, Luz P

kaukrise commented 1 month ago

Hey, apologies for the late response.

Here is some pseudocode for fanc compare:

if input are matrices:
  if comparison == 'fold-change':
    use FoldChangeMatrix
  else if comparison == 'difference':
    use DifferenceMatrix
else if input are scores:
  if comparison == 'fold-change':
    use FoldChangeScores
  elif comparison == 'difference':
    use DifferenceScores
else if input are regions:
  if comparison == 'fold-change':
    use FoldChangeRegions
  elif comparison == 'difference':
    use DifferenceRegions

As you can see, fanc compare uses DifferenceRegions under the hood if you provide BED files and the --comparison difference argument. The default, however, is to use fold-change - maybe that is where the difference stems from?

You can see the actual code here: https://github.com/vaquerizaslab/fanc/blob/d5d86085c920a4dca6e5f6be4857129d718243cc/fanc/commands/fanc_commands.py#L3533-L3548

So, the call would be


DifferenceRegions.from_regions(
  matrix1, matrix2, 
  file_name=comparison_output, 
  tmpdir=tmp, 
  mode='w', 
  log=log
)

All fanc compare does is to calculate either the difference or the fold-change of values in the BED for each region. There are no statistics involved.

https://github.com/vaquerizaslab/fanc/blob/d5d86085c920a4dca6e5f6be4857129d718243cc/fanc/architecture/comparisons.py#L532-L540