karbalayghareh / GraphReg

Chromatin interaction aware gene regulatory modeling with graph attention networks
https://genome.cshlp.org/content/32/5/930.short
17 stars 3 forks source link

Questions on feature attribution data of the GraphReg paper #1

Closed ChenLinhui9 closed 2 years ago

ChenLinhui9 commented 2 years ago

Dear Dr.Karbalayghareh,

Hope all is well!

I'm interested in your paper Chromatin interaction–aware gene regulatory

modeling with graph attention networks, especially on Figure 3. In order to get more details about the interaction between the enhancer-gene-pairs related with the figure. I’m currently trying to reproduce the DeepSHAP and Saliency feature attribution applied to GraphReg. I have some questions regarding the code uploaded on GitHub(https://github.com/karbalayghareh/GraphReg) which I will elaborate in the following paragraphs. Thank you for your time and patience in advance.

In order to get the .tfr files used in Epi_models_fa_ensemble.py (https://github.com/karbalayghareh/GraphReg/blob/master/feature_attribution/Epi_models_fa_ensemble.py), I traced back to several steps ahead of this, and found I need your help with the following input files:

Data_read.py:

genome_cov_file = data_path+'/data/'+cell_line+'/bam/GM12878_CAGE_binsize_5000bp.bigWig'

seqs_bed_file = data_path+'/data/csv/seqsbed/'+organism+'/'+genome+'/'+res+'/sequences'+chr_temp+'.bed'

hic_to_graph.py:

filename_hic = data_path+'/data/'+cell_line+'/hic/'+assay_type+'/'+cellline+''+assay_type+'FDR'+fdr+'_'+chr

filename_seqs = data_path+'/data/csv/seqsbed/'+organism+'/'+genome+'/'+res+'/sequences'+chr+'.bed'

Find_tss.py:

filename_tss = data_path+'/data/tss/'+organism+'/'+genome+'/gencode.v38.annotation.gtf.tss.bed'

filename_seqs=data_path+'/data/csv/seqsbed/'+organism+'/'+genome+'/'+resolution+'/sequences'+chr+'.bed'

data_write.py:

fasta_file = data_path+'/data/genome/GRCh38.primary_assembly.genome.fa

tss_bin_file = data_path+'/data/tss/'+organism+'/'+genome+'/tssbins'+chr_temp+'.npy'

bin_start_file = data_path+'/data/tss/'+organism+'/'+genome+'/binstart'+chr_temp+'.npy'

Could you please share these files with me? If it’s possible, could you please share the score (.npy files generated by calculate_loss() function in Epi_models_fa_ensemble.py) with me?

I really appreciate your time and consideration. Look forward to hearing from you.

Best Regards,

Linhui

karbalayghareh commented 2 years ago

Hi Linhui,

I have provided a detailed data preparation tutorial here. Please follow the instruction and all the necessary files will be generated subsequently. You need bam files for epigenomic data (such as DNase, H3K4me3, H3K27ac, and CAGE) and Hi-C files to start. These files can be downloaded from ENCODE and their accession codes are provided in the GrpahReg paper. For example, the bam file for K562 CAGE can be found with accession code ENCFF623BZZ.

Best, Alireza

ChenLinhui9 commented 2 years ago

Thank you very much Alireza!

All The Best,

Linhui