loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
180 stars 37 forks source link

Empty output from CreateNetwork #260

Open swethas112 opened 3 months ago

swethas112 commented 3 months ago

Hello,

Thanks for the great tool. I was hoping to identify networks with CreateNetwork module using the command: TOBIAS CreateNetwork --TFBS annotated/*/beds/*Knockout_bound.bed --origin motif2gene_mapping.txt I was not getting any networks in the output files. Could you help me figure out if I had missed something? I am attaching one of the bed files and the origin file for reference.

Thanks, Swetha

motif2gene_mapping.txt

TFBS:


chr1    11274014    11274021    GATA1   8.20623 +   chr1    11272967    11274172    Knockout,Wildtype   .   .   start_codon 11273531    11273534    +   end 35  FeatureInsidePeak   0.002   1.0 NA  CCDS81260.1 ENSE00001471726.1   1   ENSG00000120942.14  UBIAD1  protein_coding  OTTHUMG00000002075.2    OTTHUMT00000005775.2    HGNC:30791  2   NA  ENSP00000366000.1   basic,CCDS  ENST00000376804.2   UBIAD1-201  2   protein_coding  query_1 1.30158
chr1    11671620    11671627    GATA1   8.20623 -   chr1    11671222    11672148    Knockout,Wildtype   .   .   CDS 11671927    11672023    +   start   242 FeatureInsidePeak   0.104   1.0 NA  CCDS133.1   ENSE00000818922.1   4   ENSG00000116663.11  FBXO6   protein_coding  OTTHUMG00000002229.2    OTTHUMT00000006332.2    HGNC:13585  2   NA  ENSP00000365944.4   basic,Ensembl_canonical,MANE_Select,appris_principal_1,CCDS ENST00000376753.9   FBXO6-201   1   protein_coding  query_1 2.79763
chr1    11842441    11842448    GATA1   8.20623 -   chr1    11842018    11843006    Knockout,Wildtype   .   .   transcript  11806190    11843130    +   end 618 PeakInsideFeature   1.0 0.027   NA  CCDS138.1   NA  NA  ENSG00000011021.23  CLCN6   protein_coding  OTTHUMG00000002299.8    OTTHUMT00000006639.3    HGNC:2024   2   NA  ENSP00000234488.9   basic,Ensembl_canonical,MANE_Select,appris_principal_1,CCDS ENST00000346436.11  CLCN6-202   1   protein_codinquery_1    6.98205
chr1    16206462    16206469    GATA1   9.08368 +   chr1    16206170    16207303    Knockout,Wildtype   .   .   CDS 16206947    16207210    -   end 211 FeatureInsidePeak   0.232   1.0 NA  CCDS170.1   ENSE00000955436.1   6   ENSG00000142632.17  ARHGEF19    protein_coding  OTTHUMG00000002219.5    OTTHUMT00000006289.2    HGNC:26604  2   NA  ENSP00000270747.3   basic,Ensembl_canonical,MANE_Select,appris_principal_1,CCDS ENST00000270747.8   ARHGEF19-201    1   protein_coding  query_1 1.53035
chr1    16644124    16644131    GATA1   8.20623 +   chr1    16643710    16645115    Knockout,Wildtype   .   .   exon    16644645    16644683    -   end 233 FeatureInsidePeak   0.027   1.0 NA  NA  ENSE00001411231.2   ENSG00000291072.1   ENSG00000291072 lncRNA  NA  OTTHUMT00000092783.2    NA  2   NA  NA  basic   ENST00000362058.2   ENST00000362058 2   lncRNA  query_1 13.18856```
mohobein commented 2 months ago

Hey Swetha,

for CreateNetwork to work, the bed file should contain the information about the target gene for each TFBS. The ID for the target gene also needs to be present in your --origin file to link them accordingly. If you look at your bed file, all your ensemble IDs have a version number attached to them (ENSG00000116663.11 -> ENSG00000116663 version 11). In your motif2gene_mapping.txt file, the genes do not carry version numbers. I suspect that this might be the reason you do not get any networks.

Perhaps your problem can be solved by removing the version numbers from all gene IDs in you bed file, or by using a mapping file that also includes version numbers. They have to match to be able to identify networks.

Also make sure that both files correspond to the same organism, but both your bed file and your mapping file contain human gene IDs, so this should be able to work.

If this does not work, could you please run the tool using the argument --verbosity 4 to enable debug printouts? These would be helpful to me for identifying the problem.

I hope this solves your issue.

Best regards, Moritz

hyBio commented 2 months ago

Hi, I have the following three questions for CreateNetwork:

  1. The two columns in the motif2gene_mapping.txt file are supposed to be the motif name \t gene name or the gene name \t gene product name as shown below, which is very confusing to me. image
  2. does the first column of motif2gene_mapping.txt need to match with the fourth column of TFBS, and do I need to adjust accordingly if I customize the motif name?
  3. If it is a non-model species, how should I get the motif and its regulated gene set, can I just use the motif2gene_mapping.txt in test data? Looking forward to your reply, thanks a lot.