nf-core / funcscan

(Meta-)genome screening for functional and natural product gene sequences
https://nf-co.re/funcscan
MIT License
74 stars 20 forks source link

BGC Summary Table #64

Closed jfy133 closed 1 year ago

jfy133 commented 2 years ago

Description of feature

Should produce two files:

summary (Sample_Name,Tool,No_Hits) aggregated (Sample_Name,Tool,Contig,Hit_Name,Probability,....)

jasmezz commented 2 years ago

My idea of the aggregation table:

BGC_ID Sample_ID Prediction_tool Contig_ID Product_class Contig_edge BGC_start BGC_end BGC_length Protein_count Protein_ID PFAM_ID MIBiG_ID InterPro_ID
1 Sample_1 antiSMASH c_001 Arylpolyene no 123 456 334 2 OGCKDNOF_00056;OGCKDNOF_00057 PF00668 BGC0001894  
2 Sample_1 GECCO c_002 RiPP one-side-truncated 123 456 334 1 OGCKDNOF_00056 PF00668;PF08242   IPR001031
3 Sample_2 antiSMASH c_001 NRPS two-side-truncated 123 456 334 1 OGCKDNOF_00056 PF08242 BGC0001894  
4 Sample_2 DeepBGC c_002 Arylpolyene no 123 456 334 3 OGCKDNOF_00056;OGCKDNOF_00058;OGCKDNOF_00059 PF00668;PF08242;PF08243 BGC0001894  

Protein_count and Protein_ID refer to the annotations from prodigal/prokka.

Feedback welcome @nf-core/funcscan so that I can implement the comBGC tool without changing too much later on.

Current considerations:

... Database_annotations
... MIBiG-BGC0001894;BGC0001895
... InterPro-IPR001031;IPR001032;IPR001033