mibig-secmet / mibig-json

Repository to track changes in MIBiG curation data stored in JSON format
6 stars 6 forks source link

BGC0001678 and BGC0001883 share a sequence, but not genes #199

Closed SJShaw closed 2 years ago

SJShaw commented 2 years ago

Affected BGC BGC0001678 and BGC0001883

Describe the error Both entries have the same reference paper and the same reference sequence (with minor differences in start and end position). The reference sequence contains no genes, and while BGC1883 annotates around 20 genes, BGC1678 only annotates one. The location of that one gene doesn't match up with any of the others.

One of the entries needs to be retired, and some kind of documentation added about where the gene annotations come from.

SJShaw commented 2 years ago

The paper referenced by both entries links to marinegenomics.oist.jp as having gene annotations for Symbiodinium sp. clade A Y106 (with GFF and FASTA files available), along with an NIES accession (NIES-4076) that links to the BGNK01000000.1 assembly, with the BGNK01000303.1 contig used for both entries having the annotation of scaffold314.1 that matches the paper.

Replacing BGC0001678's single existing gene with the s314_g33 gene mentioned in the paper from the GFF provides more hits for the mentioned domains in the paper (though still missing the ATPgrasp, but the surrounding genes didn't find that profile). Since BGC0001883's compound isn't mentioned in the paper and the genes don't match up, BGC0001883 should be retired since it's seems like an unrecoverable input error and the two sets of gene annotations aren't compatible.