This PR fixes some annotations for the chemical products synthesized by several BGCs. I added cross-references to PubChem and extracted additional metadata from there when it was possible.
BGC0000231
BGC0000231 does not produce a single molecule named griseusin, but two related compounds named griseusin A and griseusin B, as described in PMID:8169211
BGC0000243 and BGC0000244
These BGCs were reported produce several molecules of the macrotetrolide family, but they were not detailed. I added the metadata for the 5 macrotetrolides produced naturally and described in PMID:10858335 (the reference paper).
BGC0000248
BGC0000248 was reported to produce naphtocyclinone, but the authors name it α-naphthocyclinone in the manuscript, and since then additional naphthocyclinones have been isolated (δ-naphtocyclinone, etc.)
BGC0000402
This BGC listed its product as paenilarvins, but in the reference manuscript authors have characterized three different molecules: paenilarvin A, paenilarvin B and paenilarvin C.
BGC0000662
This BGC listed its product as grixazone, but is actually grixazone A in PubChem. I also added a new reference for this cluster (PMID:17617696) which describes the biosynthetic pathway of grixazone A based on this cluster.
BGC0001167
This BGC listed piricyclamide as its product but it actually produces 4 different compounds according to PMID:22952627.
BGC0001268
According to the reference paper (PMID:23932525), the end product of the biosynthetic pathway encoded in this BGC is fusarin C.
BGC0001413
According to the reference paper (PMID:25510965), this BGC produces 3 cystobactamid products.
BGC0001465
The BGC product was listed as generic bromopyrroles/bromophenols, but the reference paper (PMID:24974229) gave a structure for the three naturally-occuring compounds without naming them explicitly.
I manually searched for these molecules based on the molecule structure in PubChem to get the corresponding compounds: bromophene, pentabromopseudilin and bistribromopyrrole.
BGC0001526
I fixed the name of the compounds (bartolosides A -> bartoloside A, etc.) and added cross-references to PubChem.
BGC0001620
According to the reference paper (PMID:28855504), this BGC leads to the production of 6 naturally occuring compounds (ilamycin E2 and ilamycin F were obtained by cluster engineering).
BGC0001644
According to the reference paper (PMID:30025185), this BGC produces two related compounds, lacunalide A and its desmethyl derivative lacunalide B:
BGC0001716
I added the compounds from PMID:29625040. It's a bit unclear how they should be named, in the article they are called odilorhabdin NOSO-95A or NOSO-95A with no consistency. In the end I used the longer name from PubChem.
BGC0001983
The paper reports 4 different triacsin compounds, but only triacsin C (refered as compound 3) was found:
UV traces (300 nm) confirming the production of 3 in S. tsukubaensis. In both strains, the congener 3 was produced as the major product and other congeners were not identifiable by UV in these traces.
BGC0002019
The reference paper (PMID:29806086) describes 11 tiancilactone molecules:
However, based on the metabolite profile of the strain fermentation extract, only 8 of them are being produced by Streptomyces sp. CB03234, so I only added these 8 compounds to the BGC products.
BGC0002021
Rename fogacin A to fogacin, which is named as such in PubChem and in the reference paper (PMID:30556239), despite the derivatives being called fogacin B and fogacin C.
BGC0001216
The reference paper (PMID:25763681) describes 4 different splenocin molecules, but in the manuscript text the authors say they only observe production of splenocin C:
In our hands, we only observe SPN-C in the fermentation of CNQ431 [...]
Hi !
This PR fixes some annotations for the chemical products synthesized by several BGCs. I added cross-references to PubChem and extracted additional metadata from there when it was possible.
BGC0000231
BGC0000231
does not produce a single molecule namedgriseusin
, but two related compounds namedgriseusin A
andgriseusin B
, as described in PMID:8169211BGC0000243 and BGC0000244
These BGCs were reported produce several molecules of the
macrotetrolide
family, but they were not detailed. I added the metadata for the 5 macrotetrolides produced naturally and described in PMID:10858335 (the reference paper).BGC0000248
BGC0000248
was reported to producenaphtocyclinone
, but the authors name itα-naphthocyclinone
in the manuscript, and since then additional naphthocyclinones have been isolated (δ-naphtocyclinone
, etc.)BGC0000402
This BGC listed its product as
paenilarvins
, but in the reference manuscript authors have characterized three different molecules:paenilarvin A
,paenilarvin B
andpaenilarvin C
.BGC0000662
This BGC listed its product as
grixazone
, but is actuallygrixazone A
in PubChem. I also added a new reference for this cluster (PMID:17617696) which describes the biosynthetic pathway ofgrixazone A
based on this cluster.BGC0001167
This BGC listed
piricyclamide
as its product but it actually produces 4 different compounds according to PMID:22952627.BGC0001268
According to the reference paper (PMID:23932525), the end product of the biosynthetic pathway encoded in this BGC is
fusarin C
.BGC0001413
According to the reference paper (PMID:25510965), this BGC produces 3
cystobactamid
products.BGC0001465
The BGC product was listed as generic bromopyrroles/bromophenols, but the reference paper (PMID:24974229) gave a structure for the three naturally-occuring compounds without naming them explicitly.
I manually searched for these molecules based on the molecule structure in PubChem to get the corresponding compounds:
bromophene
,pentabromopseudilin
andbistribromopyrrole
.BGC0001526
I fixed the name of the compounds (
bartolosides A
->bartoloside A
, etc.) and added cross-references to PubChem.BGC0001620
According to the reference paper (PMID:28855504), this BGC leads to the production of 6 naturally occuring compounds (
ilamycin E2
andilamycin F
were obtained by cluster engineering).BGC0001644
According to the reference paper (PMID:30025185), this BGC produces two related compounds,
lacunalide A
and its desmethyl derivativelacunalide B
:BGC0001716
I added the compounds from PMID:29625040. It's a bit unclear how they should be named, in the article they are called
odilorhabdin NOSO-95A
orNOSO-95A
with no consistency. In the end I used the longer name from PubChem.BGC0001983
The paper reports 4 different
triacsin
compounds, but onlytriacsin C
(refered as compound 3) was found:BGC0002019
The reference paper (PMID:29806086) describes 11
tiancilactone
molecules:However, based on the metabolite profile of the strain fermentation extract, only 8 of them are being produced by Streptomyces sp. CB03234, so I only added these 8 compounds to the BGC products.
BGC0002021
Rename
fogacin A
tofogacin
, which is named as such in PubChem and in the reference paper (PMID:30556239), despite the derivatives being calledfogacin B
andfogacin C
.BGC0001216
The reference paper (PMID:25763681) describes 4 different
splenocin
molecules, but in the manuscript text the authors say they only observe production ofsplenocin C
: