medema-group / BiG-MAP

Other
25 stars 7 forks source link

Interpreting big-map outputs #18

Open flashton2003 opened 1 year ago

flashton2003 commented 1 year ago

Hello,

I've run BiG-MAP to identify the gene clusters in some microbiome samples comparing health and disease.

There seem to be interesting patterns in my dataset of particular gene cluster types being associated with health or disease:

Screenshot 2023-05-04 at 15 33 11

Do you have any suggestions on approaches to test whether these patterns are statistically significant?

Each gene cluster example is statistically significantly associated with e.g. health. For example, gb.KB291615.1.region001.GC_DNA..Entryname.acetate2butyrate..OS.Clostridium_celatum_DSM_1785_genomic_scaffold..SMASHregion.region001..NR.1 was associated with health.

But then, what about at the gene cluster level? Could just look at whether the counts in the screenshot above are significant, but that seems to be discarding a lot of information.

I was also thinking about whether the RPKMs could be "summed" at the "gene cluster type" level (e.g. acetate2butyrate), and compared between health and disease.

Any thoughts welcome!

Thanks,

Phil

I have two countries in my study, so I've narrowed down the list of hits by filtering for only pathways that are consistently associated with health/disease in both coun

HAugustijn commented 1 year ago

Do you have any suggestions on approaches to test whether these patterns are statistically significant?

Did you happen to look at the statistical methods offered in the analysis module (BiG-MAP.analyse)?

I was also thinking about whether the RPKMs could be "summed" at the "gene cluster type" level (e.g. acetate2butyrate), and compared between health and disease.

Yes the reads for similar cluster types with the same end products can be summed. Here is an example of how we applied this to create Fig. 3. An alternative is to modify the settings of the family module to create larger gene cluster families.

flashton2003 commented 1 year ago

Hi Hannah,

Yes, I ran BiG-MAP.analyse but the output didn't really make sense to me. This is the kruskall-wallis csv I got, but it doesn't seem to be grouped by condition (I've sub-sampled it, but the parts I deleted were the same kind of thing - MGCs vs samples). Perhaps I mis-specified something?

Acute_TyphivsControl_HealthySerosurvey_GC_kw.subsample.csv

This was the output run with:

python3 ~/programs/BiG-MAP/src/BiG-MAP.analyse.py --explore --compare -B clean/biom-results/BiG-MAP.mapcore.metacore.dec.biom -T metagenomic -M DiseaseStatus -g Acute_Typhi Control_HealthySerosurvey -O clean/analyse_output/

Thanks for the example!